CN108647495A - Personal status relationship identification method, device, equipment and storage medium - Google Patents

Personal status relationship identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN108647495A
CN108647495A CN201810490416.4A CN201810490416A CN108647495A CN 108647495 A CN108647495 A CN 108647495A CN 201810490416 A CN201810490416 A CN 201810490416A CN 108647495 A CN108647495 A CN 108647495A
Authority
CN
China
Prior art keywords
purpose snp
snp site
site
information
personal status
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810490416.4A
Other languages
Chinese (zh)
Other versions
CN108647495B (en
Inventor
刘晶星
刘菲菲
庞柳
赵薇薇
于世辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kingmed Diagnostics Group Co ltd
Guangzhou Kingmed Diagnostics Central Co Ltd
Original Assignee
Guangzhou Kingmed Diagnostics Group Co ltd
Guangzhou Kingmed Diagnostics Central Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kingmed Diagnostics Group Co ltd, Guangzhou Kingmed Diagnostics Central Co Ltd filed Critical Guangzhou Kingmed Diagnostics Group Co ltd
Priority to CN201810490416.4A priority Critical patent/CN108647495B/en
Publication of CN108647495A publication Critical patent/CN108647495A/en
Application granted granted Critical
Publication of CN108647495B publication Critical patent/CN108647495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Abstract

The present invention relates to a kind of personal status relationship identification method, device, equipment and the storage mediums of the validity and reliability that can improve qualification result.The personal status relationship identification method is when personal status relationship is identified, by being retrieved one by one to multiple purpose SNP sites in the mutational site of sample abrupt information, obtain mutation and the sequencing information of the genotype comprising each purpose SNP site, and the purpose SNP site for meeting preset requirement and its mutation and sequencing information are selected from multiple purpose SNP sites, obtain identification information, finally in the identification information of more different samples corresponding purpose SNP site genotype, the personal status relationship of different samples is identified.Since the mutation rate that SNP is mutated to other directions is extremely low, even if limited if influences of the single purpose SNP of mutation to final result, therefore, the identification that personal status relationship is carried out by SNP can significantly improve the validity and reliability of qualification result compared with traditional method detected using STR.

Description

Personal status relationship identification method, device, equipment and storage medium
Technical field
The present invention relates to molecular biology and bioinformatics technique fields, more particularly, to a kind of personal status relationship identification side Method, device, equipment and storage medium.
Background technology
The method of the personal status relationships such as individual identification and kinship identification (including paternity test) identification is based primarily upon at present STR (short tandem repeat, short tandem repeat) is detected.The study found that in human genome STR quantity phase It is considerably less for SNP (Single Nucleotide Polymorphism, single nucleotide polymorphism), there is height for individual differentiation The STR of discrimination is less, then excludes the situation that STR frequency fluctuations are larger in different crowd, and remaining available STR just has very much Limit.And single STR is easier to mutate relative to SNP etc., and due to available STR limited amounts, even if only occurring one A mutation also can generate large effect to final qualification result.
Invention content
Based on this, it is necessary in view of the above technical problems, provide a kind of validity that can improve qualification result and reliable Personal status relationship identification method, device, equipment and the storage medium of property.
A kind of personal status relationship identification method, includes the following steps:
Step S1:Obtain the sample abrupt information that sequencing result is compared;
Step S2:Multiple purpose SNP sites are retrieved one by one in the mutational site of the sample abrupt information, are obtained Mutation to the genotype comprising each purpose SNP site and sequencing information;
Step S3:The purpose SNP site for meeting preset requirement and its mutation and sequencing are selected from multiple purpose SNP sites Information obtains identification information;
Step S4:The genotype for comparing corresponding purpose SNP site in the identification information of different samples, to different samples Personal status relationship is identified.
Each purpose SNP site is located on autosome exon or functional non-volume in one of the embodiments, On code region, and allele crowd's frequency is between 0.45~0.55.
In one of the embodiments, in the step S2, in the mutational site of the sample abrupt information The purpose SNP that can be retrieved, then illustrate the genotype of the purpose SNP site be with inconsistent homozygous of reference gene type or Heterozygous, the mutation of acquisition and sequencing information include the genotype of the purpose SNP site, allele crowd frequency, mutation matter Amount and sequencing coverage;
For the purpose SNP for failing to retrieve in the mutational site of the sample abrupt information, then illustrate purpose SNP The genotype in site is consistent with reference gene type homozygous, and the mutation of acquisition and sequencing information include the purpose SNP site Genotype, allele crowd frequency and sequencing coverage.
In one of the embodiments, in the step S3, it refers to that sequencing coverage satisfaction is not small to meet preset requirement It is covered in 30 reads, mutation quality meets the acquiescence quality control standard of GATK.
The step S4 includes in one of the embodiments,:
Step S41:Judge whether it is that individual identification is carried out to different samples, if so, more different samples is all right The genotype for the purpose SNP site answered analyzes whether the difference sample belongs to same individual according to comparison result.
The step S4 includes in one of the embodiments,:
Step S42:Judge whether it is that parenthood determination is carried out to different samples, if so, according to purpose SNP site Genotype and corresponding allele crowd frequency calculate the paternity index of each matched purpose SNP site, and by each matching The paternity index of purpose SNP site determine comprehensive paternity index, whether which is analyzed according to the synthesis paternity index Belong to parent child relationship;
The matched purpose SNP site refers to that at least one allele of different samples is purpose SNP identical Point.
The step S4 includes in one of the embodiments,:
Step S43:Judge whether to be the kinship identification for carrying out different samples other non-parent child relationships, if so, The kinship of other non-parent child relationships is analyzed according to the number of unmatched purpose SNP site;
The unmatched purpose SNP site refers to purpose SNP that two allele of different samples are different from Point.
A kind of personal status relationship identification apparatus, including:
Abrupt information acquisition module, for obtaining the sample abrupt information that sequencing result is compared;
Purpose SNP information searching modules are used in the mutational site of the sample abrupt information to multiple purpose SNP Point is retrieved one by one, obtains mutation and the sequencing information of the genotype comprising each purpose SNP site;
Identification information selecting module, for selecting the purpose SNP site for meeting preset requirement from multiple purpose SNP sites And its mutation and sequencing information, obtain identification information;And
Personal status relationship identifies module, the gene for corresponding purpose SNP site in the identification information of more different samples Type identifies the personal status relationship of different samples.
There is a kind of computer equipment processor and memory, the memory to be stored with computer program, the processing Device realizes the step of personal status relationship identification method described in any of the above-described embodiment when executing the computer program.
A kind of computer storage media, is stored thereon with computer program, and the computer program is performed in realization The step of stating the personal status relationship identification method described in any embodiment.
Above-mentioned personal status relationship identification method, device, equipment and storage medium, when personal status relationship is identified, by sample Multiple purpose SNP sites are retrieved one by one in the mutational site of abrupt information, obtain the gene for including each purpose SNP site The mutation of type and sequencing information, and purpose SNP site and its mutation for meeting preset requirement are selected from multiple purpose SNP sites And sequencing information, obtain identification information, finally in the identification information of more different samples corresponding purpose SNP site gene Type identifies the personal status relationship of different samples.It is single even if mutation since the mutation rate that SNP is mutated to other directions is extremely low Influences of a purpose SNP to final result is also limited, therefore, the identification of personal status relationship is carried out by SNP, made compared with traditional The method detected with STR, can significantly improve the validity and reliability of qualification result.
It is further, extremely low to mispairing tolerance the study found that since individual identification and paternity test are complete matched identifications, So individual identification or paternity test can generally reach relatively good identification result using 20 STR, but for other parents Category relationship is identified, since not all site all matches, may result in larger random error in identification in this way.Such as Monoploid has 50% not homologous site between grandparent and grandchild, and it is random crowd averagely to have 10 matching results in 20 STR at this time Matching result, such number of sites causes the number of sites random fluctuation of final actual match bigger very little, to identification result It is excessively poor.And quantity of the SNP in human genome it is very huge (thousand Human Genome Programs report human polymorphism SNP reach To 80,000,000, it is average everyone there are about 350-400 ten thousand), can be provided more preferably for the identification of all kinds of personal status relationships using SNP Support.Paternity test is can be not only used for using SNP, can be also used for individual identification and its in addition to parenthood determination He identifies that error is small at kinship, and reliability is high.
Further, the detection method of traditional STR is DNA fragmentation analysis, is not conventional DNA sequencing method, and STR It is located at intergenic region, the nonfunctional region being much all presently considered to be mostly, general sequencing project all will not relate to these areas Domain, thus in these sequencing projects if encounter need identify personal status relationship when, generally require additionally plus do an inspection The experiment of STR is surveyed, it is time-consuming and laborious, and the raising of project cost can be caused.And gene extron and other functional non-codings Just there are a large amount of enough SNP on region, thus further these can be utilized in most scientific research clinic sequencing projects In the SNP that has all measured carry out personal status relationship identification, can identify all kinds of personal status relationships without additional experiment.Thus, it uses Above-mentioned personal status relationship identification method is time saving, and can reduce testing cost.
Description of the drawings
Fig. 1 is the personal status relationship identification method flow diagram of an embodiment;
Fig. 2 is the flow diagram of specific example when being identified the personal status relationship of different samples in Fig. 1;
Fig. 3 is the structural schematic diagram of the personal status relationship identification apparatus of an embodiment;
Fig. 4 is the structural schematic diagram for the specific example that personal status relationship identifies module in Fig. 3.
Specific implementation mode
To facilitate the understanding of the present invention, below with reference to relevant drawings to invention is more fully described.In attached drawing Give presently preferred embodiments of the present invention.But the present invention can realize in many different forms, however it is not limited to this paper institutes The embodiment of description.Keep the understanding to the disclosure more thorough on the contrary, purpose of providing these embodiments is Comprehensively.
Unless otherwise defined, all of technologies and scientific terms used here by the article and belong to the technical field of the present invention The normally understood meaning of technical staff is identical.Used term is intended merely to description tool in the description of the invention herein The purpose of the embodiment of body, it is not intended that in the limitation present invention.Term as used herein "and/or" includes one or more phases Any and all combinations of the Listed Items of pass.
" personal status relationship identification " as described herein includes individual identification, parenthood determination and other non-parent child relationships Kinship is identified, such as the identification of grandfather grandson's relationship, the identification of uncle and nephew relationship, the identification of siblings' relationship, the identification of cousin's relationship, nephew uncle Relationship identification etc.;" the crowd's frequency of mutation " of the SNP site refers to one in a specific crowd (such as Chinese group) The frequency of occurrences of a SNP site and that inconsistent base of reference sequences;" allele crowd's frequency of the SNP site Rate " refers to the frequency of occurrences of each allele of a SNP site in a specific crowd (such as Chinese group);It is described " mutation quality " refer to the given acquiescence quality control standards of GATK (or other mutation analysis softwares);" read " refers to height Sequencing sequence caused by flux microarray dataset (such as all kinds of two generations microarray datasets);" the sequencing coverage " refers to a survey The read numbers of tagmeme point covering.
As shown in Figure 1, one embodiment of the invention provides a kind of personal status relationship identification method comprising following steps:
Step S110:Obtain the sample abrupt information that sequencing result is compared.
To each sample, the method that can be used but not limited to the sequencing of two generations is sequenced, and sequencing result is obtained. It to after sequencing result, can be compared in the reference gene group of the mankind, mutation file can be obtained by analysis, contained The sample abrupt information of the sample.The sample abrupt information includes the information such as mutational site, the frequency of mutation, mutation quality.Institute It is for reference gene group to state mutation, i.e., sequencing result, which is shown, has and corresponding region or site in reference gene group The different variation of sequence.
Step S120:Multiple purpose SNP sites are retrieved one by one in the mutational site of sample abrupt information, are obtained Include mutation and the sequencing information of the genotype of each purpose SNP site.
Each purpose SNP site is preferably placed on autosome exon or on the domain of functional non-coding regions, and allele Crowd's frequency is between 0.45~0.55.Parent child relationship (father and son, mothers and sons) are identified, the purpose retrieved is generally required The quantity of SNP site can reach about 99.999% accuracy at 100, and 960 sites can reach about (100-10-53) % accuracy, therefore, for parenthood determination, the number of purpose SNP site can require to be not less than 100;For it He kinship identifies that such as later analysis is all according to the desired value for mismatching number of sites regardless of how many a purpose SNP sites Supposition analysis is carried out, although 100% cannot conclude that confidence level is still very high, general purpose SNP site number is more, knot The reliability of fruit is bigger, for example, general purpose SNP site number is not less than 720 relatives that can be carried out this level-one of cousin Relationship identifies that purpose SNP site can carry out the kinship identification of this level-one of grandfather grandson/uncle and nephew, purpose SNP not less than 480 Site can carry out the kinship identification of this level-one of siblings not less than 240;It is purpose SNP for individual identification The detection that the genotype of point exactly matches, the number of general purpose SNP site can require to be not less than 50.
In a specific example, can be selected it is being located in Chinese population as shown in table 1 below on often dyeing exon and Multiple purpose SNP sites in 984 purpose SNP sites of the allele crowd frequency between 0.45~0.55, these Purpose SNP site is contained in the project of most gene exon sequencing.
Table 1
Note:The above SNP site reference sequences are hg19.With " 10 | 101293035 | C | A " the purpose SNP site that indicates For, wherein " | " separates symbol as project, and " 10 " indicate that chromosome number, " 101293035 " indicate on corresponding chromosome Coordinate position, " C " indicates that the base consistent with corresponding site in reference gene group, " A " expression are corresponding in reference gene group Another inconsistent base of site;Other purposes SNP site is similarly.
When being retrieved one by one to multiple purpose SNP sites in the mutational site of sample abrupt information, it may retrieve The current purpose SNP, it is also possible to which retrieval is less than current purpose SNP.For that can be examined in the mutational site of sample abrupt information The purpose SNP that rope arrives then illustrates that the genotype of the purpose SNP site is the homozygous or heterozygosis inconsistent with reference gene type Type, the mutation of acquisition and sequencing information include the genotype of the purpose SNP site, allele crowd frequency, mutation quality with And sequencing coverage;For the purpose SNP for failing to retrieve in the mutational site of sample abrupt information, then illustrate the purpose The genotype of SNP site is consistent with reference gene type homozygous, and the mutation of acquisition and sequencing information include purpose SNP Genotype, allele crowd frequency and the sequencing coverage of point.It is sequenced the information such as coverage, it can be according to currently retrieving Purpose SNP site situation is compared in file (such as bam files) from the sequencing of sample and is calculated.
It is denoted as R with the allele of reference, the allele of mutation is denoted as V, and the mankind are diploids, for described The purpose SNP that can be retrieved in the mutational site of sample abrupt information illustrates the gene of the current purpose SNP site of the sample Type is VV (homozygous) or RV (heterozygous), for the purpose for failing to retrieve in the mutational site of the sample abrupt information SNP illustrates that the genotype of the current purpose SNP site of the sample is RR.
Step S130:The purpose SNP site for meeting preset requirement and its mutation and survey are selected from multiple purpose SNP sites Sequence information, obtains identification information.
Specifically, it refers to that sequencing coverage meets more than 30 read coverings to meet preset requirement, and mutation quality meets The acquiescence quality control standard of GATK.
Acquiescence quality control standard, that is, QD of GATK>2.0 and MQ>40.0 and FS<60.0 and HaplotypeScore<60.0 and MQRankSum>- 12.5 and ReadPosRankSum>-8.0.
Fail-safe analysis is carried out by multiple purpose SNP sites, the shared high quality site of sample can be filtered out, avoided The influence that insecure site judges result.Having enough coverage and matter is required to the parting of a purpose SNP site Amount control, otherwise probably because randomness and misclassification, such as certain purpose SNP site father is AA types, and son is AT types, but such as The coverage in the fruit son site is very low or poor quality, such as only 5 read, it is possible to which this 5 read are just T, or there is poor quality's to lead to not measure A, finally the parting of son will be judged as TT.
Step S140:The genotype for comparing corresponding purpose SNP site in the identification information of different samples, to different samples Personal status relationship identified.
The genotype for summarizing each purpose SNP site for meeting preset requirement selected, obtains identification information, can generate such as The identification information file of utag formats.
The identification information file may be used for individual identification, kinship identification etc..
By taking parent child relationship as an example, the parentage exclusion probability PE=2*p of single purpose SNP site2*(1-p)2, p is purpose SNP Allele crowd's frequency in site, PE values can obtain maximum value 0.125 in p=0.5.In p between 0.45~0.55 When, the parentage exclusion probability minimum 0.1225125 of the single purpose SNP site.For 984 purpose SNP sites, this hair is used The parentage exclusion probability that bright method obtainsFar above 20 traditional STR bits The paternity test method of point.
In one specifically example, as shown in Fig. 2, step S140 includes:
Step S141:Judgement is to carry out individual identification or kinship identification to different samples, if carrying out individual knowledge Not, S142 is thened follow the steps, it is no to then follow the steps S143.
Step S142:The then genotype of all corresponding purpose SNP sites of more different samples, according to comparison result point Analyse whether the difference sample belongs to same individual.
In principle, for individual identification, need the genotype of all corresponding purpose SNP sites is completely the same can just sentence Disconnected is same individual, but when a large amount of purpose SNP sites are compared and analyzed, the genotype for minute quantity purpose SNP site occur differs In the case of cause, can also optionally make a concrete analysis of, as degradation occurs for sample DNA, by inspection individual certain SNP in embryonic differentiation process Middle mutation.Such as the mutation generated during embryonic differentiation, the gene at a human body different tissues position can be made to have small Difference, individual identification may be derived from the sample of different parts, although this possibility is very low, still remain, but general The judgement of individual identification is not influenced.
Step S143:Count in the corresponding purpose SNP site of different samples matched purpose SNP site number and/or Unmatched purpose SNP site number.
The matched purpose SNP site refers to the identical purpose SNP site of different samples at least one allele. The unmatched purpose SNP site refers to the purpose SNP site that two allele of different samples are different from.Described The sum of the number for the purpose SNP site matched and the unmatched purpose SNP site number are equal to general purpose in identification information The number of SNP site.
Step S144:Judgement is to carry out parenthood determination or other relatives in addition to parent child relationship to different samples Relationship is identified, if parenthood determination, thens follow the steps S145, no to then follow the steps S146.
Step S145:It is calculated according to the genotype of purpose SNP site and corresponding allele crowd frequency each matched The paternity index of purpose SNP site, and comprehensive paternity index, root are determined by the paternity index of each matched purpose SNP site Analyze whether the difference sample belongs to parent child relationship according to the synthesis paternity index.
The paternity index PI of each matched purpose SNP site is calculated according to following formula:piFor matching Gene frequency, PI takes the sum of all the case where capable of matching.Comprehensive paternity index CPI is the product of all PI values.
Can analyze whether the difference sample belongs to parent child relationship according to synthesis paternity index CPI, such as in general, It can be judged to being parent child relationship when CPI > 1000.
Step S146:The kinship of other non-parent child relationships is analyzed according to the number of unmatched purpose SNP site.
It is understood that in other embodiments, step S140 can be only to individual identification, parenthood determination and except parent A kind of personal status relationship in the identification of other kinships except subrelation identification is identified or two kinds of personal status relationships identifications, accordingly Ground, such as in a specific example, step S140 includes:Judge whether be to different samples carry out individual identification, if so, The genotype for comparing all corresponding purpose SNP sites of different samples analyzes whether the difference sample belongs to according to comparison result In same individual;For another example in another specific example, step S140 includes:Judge whether it is that parent-offspring is carried out to different samples Relationship is identified, if so, calculating each matched mesh according to the genotype of purpose SNP site and corresponding allele crowd frequency SNP site paternity index, and comprehensive paternity index is determined by the paternity index of each matched purpose SNP site, according to The synthesis paternity index analyzes whether the difference sample belongs to parent child relationship, and matched purpose SNP site refers to different samples The identical purpose SNP site of at least one allele;For another example, in another specific example, step S140 includes:Sentence Whether disconnected be the kinship identification that different samples are carried out with other non-parent child relationships, if so, according to unmatched purpose SNP The number in site analyzes the kinship of other non-parent child relationships, and unmatched purpose SNP site refers to the two of different samples The purpose SNP site that a allele is different from.
More specifically, in one example, such as SNP site total number/16 of unmatched purpose SNP site number ≈ mesh It may be considered grandfather grandson or uncle and nephew etc., SNP site total number/32 of unmatched purpose SNP site number ≈ mesh can be with Think siblings.
Here, introducing concept:The uncorrelated site on monoploid level between two samples, the i.e. not no position of genetic association Point.Only uncorrelated site is possible to that the multiple purpose SNP of two samples can be caused to mismatch, and is 0.5 for crowd's frequency The ratio of SNP, three kinds of frequency of genotypes AA/BB/AB are respectively 0.25,0.25,0.5, when two samples are AA and BB SNP can be caused to mismatch, this probability is 2*0.25*0.25=0.125, i.e., 1/8, i.e., the maximum non-father of single SNP site excludes Rate.
Below with the corresponding purpose SNP sums 960 of two samples, allele crowd's frequency of all purposes SNP site It is to demonstrate the unmatched purpose SNP site number under different kinships for 0.5:
1. son has item chromosome heredity completely from father, therefore the uncorrelated number of sites between father and son is 0;
2. exchange when in view of meiosis between non-sister chromatid generates genetic recombination, son's heredity is from father's That chromosome has 0.5 to be contemplated to be heredity from grandfather, so uncorrelated number of sites between grandparent and grandchild is 960*0.5=480;
3. similarly, son's heredity has 0.5 to be contemplated to be heredity from grandmother from that chromosome of father, this part is contaminated Colour solid grandfather and grandmother respectively have 50% possibility to entail uncle, i.e., what uncle possessed this chromosome dyad is desired for 0.5*50% + 0.5*50%=0.5, so uncorrelated number of sites is 960* (1-0.5)=480 between uncle and nephew;
4. the case where siblings, it is desirable that two allele all separate sources are only uncorrelated site, and example is fatherlike For Aa, mother is Bb, then requires that entail elder brother is AB, entail younger brother be ab or Ab the combined crosswises such as aB; If there is the non-crossing combination that AB Ab etc. have public allele, purpose SNP is relevant matching site, generates intersection The probability of combination is 0.5*0.5=0.25, i.e., uncorrelated number of sites is 960*0.25=240 between siblings;
5. it is 0.5 to have the probability of identical source chromosome between uncle and nephew excessively calculated above, this chromosome dyad is lost by uncle The probability for being transmitted to cousin is 0.5, i.e. the probability of this chromosome dyad of cousin is 0.25, so uncorrelated number of sites is between cousin 960* (1-0.25)=720;
6. similarly, uncorrelated number of sites is 960* (1-0.125)=840 between nephew uncle.
It is the desired value of the uncorrelated number of sites of all kinds of kinships and unmatched purpose SNP site shown in the following table 2 The desired value of number.
Table 2
The allele crowd's frequency for being above all SNP sites of consideration is the desired result in the case of 0.5, practical In the case of cause elimination factor to reduce due to SNP allele crowds frequency departure 0.5, SNP number of unmatched purpose can be made to subtract It is few.
By example detection and the study found that for allele crowd's frequency between 0.45~0.55, be used as it is upper The multiple purpose SNP sites for stating the retrieval object in step S120, when other kinships such as final non-parent child relationship judge, The desired value that the unmatched purpose SNP site number of upper table 2 can be referred to carries out kinship judgement.
Above-mentioned personal status relationship identification method is when personal status relationship is identified, by the mutation for including cdna sample abrupt information In site, multiple purpose SNP are retrieved one by one, obtain mutation and the sequencing information of each purpose SNP, and according to each purpose The mutation of SNP and sequencing information judge whether the reliability of each purpose SNP meets preset requirement, select the mesh for meeting preset requirement SNP and its mutation and sequencing information, build identification information, finally the corresponding purpose in the identification information of more different samples SNP and its mutation and sequencing information are identified by the personal status relationship of identification information pair difference sample.Due to SNP to other The mutation rate of direction mutation is extremely low, even if limited if influences of the single purpose SNP of mutation to final result, therefore, passes through SNP The identification of personal status relationship is carried out, compared with traditional method detected using STR, the validity of qualification result can be significantly improved And reliability.
It is further, extremely low to mispairing tolerance the study found that since individual identification and paternity test are complete matched identifications, So individual identification or paternity test can generally reach relatively good identification result using 20 STR, but for other parents Category relationship is identified, since not all site all matches, may result in larger random error in identification in this way.Such as Monoploid has 50% not homologous site between grandparent and grandchild, and it is random crowd averagely to have 10 matching results in 20 STR at this time Matching result, such number of sites causes the number of sites random fluctuation of final actual match bigger very little, to identification result It is excessively poor.And quantity of the SNP in human genome it is very huge (thousand Human Genome Programs report human polymorphism SNP reach To 80,000,000, it is average everyone there are about 350-400 ten thousand), can be provided more preferably for the identification of all kinds of personal status relationships using SNP Support.Paternity test is can be not only used for using SNP, can be also used for individual identification and its in addition to parenthood determination He identifies that error is small at kinship, and reliability is high.
Further, the detection method of traditional STR is DNA fragmentation analysis, is not conventional DNA sequencing method, and STR It is located at intergenic region, the nonfunctional region being much all presently considered to be mostly, general sequencing project all will not relate to these areas Domain, thus in these sequencing projects if encounter need identify personal status relationship when, generally require additionally plus do an inspection The experiment of STR is surveyed, it is time-consuming and laborious, and the raising of project cost can be caused.And gene extron and other functional non-codings Just there are a large amount of enough SNP on region, thus further these can be utilized in most scientific research clinic sequencing projects In the SNP that has all measured carry out personal status relationship identification, can identify all kinds of personal status relationships without additional experiment.For example, clinical On be connected to the sequencing project of a genetic disease, after full exon sequencing analysis, clinical signs of suspected may be consanguineous marriage, Above-mentioned personal status relationship identification method can be used directly, identified using the SNP being sequenced, additional experiments are not necessarily to.Thus, make It is time saving with above-mentioned personal status relationship identification method, and testing cost can be reduced.
As shown in figure 3, based on thought same as mentioned above, one embodiment of the invention additionally provides a kind of personal status relationship Identification apparatus 200 comprising:
Abrupt information acquisition module 210, for obtaining the sample abrupt information that sequencing result is compared;
Purpose SNP information searching modules 220 are used in the mutational site of sample abrupt information to multiple purpose SNP Point is retrieved one by one, obtains mutation and the sequencing information of the genotype comprising each purpose SNP site;
Identification information selecting module 230, for selecting the purpose SNP for meeting preset requirement from multiple purpose SNP sites Site and its mutation and sequencing information, obtain identification information;And
Personal status relationship identifies module 240, the base for corresponding purpose SNP site in the identification information of more different samples Because of type, the personal status relationship of different samples is identified.
In one specifically example, personal status relationship identifies that module 240 includes the first judgment module 241, individual identification mould Block 242, match condition statistical module 243, the second judgment module 244, parenthood determination module 245 and other kinships Identify module 246.
First judgment module 241 is for judging it is to carry out individual identification or kinship identification to different samples.
Individual identification module 242 for more different samples all corresponding purpose SNP sites genotype, according to than Compared with interpretation of result, whether the difference sample belongs to same individual.
Matched purpose SNP in corresponding purpose SNP site of the match condition statistical module 243 for counting different samples Site number and/or unmatched purpose SNP site number.
Second judgment module 244 is for judging it is to carry out parenthood determination to different samples or in addition to parent child relationship Other kinships identification.
Parenthood determination module 245 is used for the genotype according to purpose SNP site and corresponding allele crowd frequency The paternity index of each matched purpose SNP site is calculated, and is determined by the paternity index of each matched purpose SNP site comprehensive Paternity index is closed, analyzes whether the difference sample belongs to parent child relationship according to the synthesis paternity index.
Other are non-for analyzing this according to the number of unmatched purpose SNP site for other kinships identification module 246 The kinship of parent child relationship.
Based on embodiment as described above, the present invention also provides a kind of computers can be used for personal status relationship identification to set It is standby, there is processor and memory, computer program is stored on memory, processor is realized when executing the computer program The step of stating the personal status relationship identification method of any embodiment.
It is that can pass through meter one of ordinary skill in the art will appreciate that realizing all or part of flow in the above method Calculation machine program is completed to instruct relevant hardware, and the program can be stored in a non-volatile computer-readable storage In medium, in the embodiment of the present invention, which can be stored in the storage medium of computer system, and by the computer system At least one of processor execute, to realize including flow such as the embodiment of above-mentioned each method.Wherein, the storage is situated between Matter can be magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
Accordingly, the present invention also provides it is a kind of can be used for be sequenced build library primer sequence processing computer storage media, It is stored thereon with computer program, computer program is performed the personal status relationship identification method for realizing any of the above-described embodiment Step.
Each technical characteristic of embodiment described above can be combined arbitrarily, to keep description succinct, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, it is all considered to be the range of this specification record.
Several embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Range.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (10)

1. a kind of personal status relationship identification method, which is characterized in that include the following steps:
Step S1:Obtain the sample abrupt information that sequencing result is compared;
Step S2:Multiple purpose SNP sites are retrieved one by one in the mutational site of the sample abrupt information, are wrapped The mutation of genotype containing each purpose SNP site and sequencing information;
Step S3:The purpose SNP site for meeting preset requirement and its mutation and sequencing letter are selected from multiple purpose SNP sites Breath, obtains identification information;
Step S4:The genotype for comparing corresponding purpose SNP site in the identification information of different samples, to the identity of different samples Relationship is identified.
2. personal status relationship identification method as described in claim 1, which is characterized in that each purpose SNP site is located at often dye On colour solid exon or on the domain of functional non-coding regions, and allele crowd's frequency is between 0.45~0.55.
3. personal status relationship identification method as described in claim 1, which is characterized in that in the step S2, for described The purpose SNP that can be retrieved in the mutational site of sample abrupt information then illustrates that the genotype of the purpose SNP site is and ginseng Examine the inconsistent homozygous or heterozygous of genotype, the mutation of acquisition and sequencing information include the purpose SNP site genotype, Allele crowd frequency, mutation quality and sequencing coverage;
For the purpose SNP for failing to retrieve in the mutational site of the sample abrupt information, then illustrate the purpose SNP site Genotype be consistent with reference gene type homozygous, the mutation of acquisition and sequencing information include the base of the purpose SNP site Because of type, allele crowd frequency and sequencing coverage.
4. personal status relationship identification method as claimed in claim 3, which is characterized in that in the step S3, meet default want Seeking Truth refers to sequencing coverage and meets not less than 30 read coverings, and mutation quality meets the acquiescence quality control standard of GATK.
5. personal status relationship identification method as described in any one of claims 1 to 4, which is characterized in that the step S4 includes:
Step S41:Judge whether it is that individual identification is carried out to different samples, if so, more different samples is all corresponding The genotype of purpose SNP site analyzes whether the difference sample belongs to same individual according to comparison result.
6. personal status relationship identification method as described in any one of claims 1 to 4, which is characterized in that the step S4 includes:
Step S42:Judge whether it is that parenthood determination is carried out to different samples, if so, according to the gene of purpose SNP site Type and corresponding allele crowd frequency calculate the paternity index of each matched purpose SNP site, and by each matched mesh The paternity index of SNP site determine comprehensive paternity index, analyze whether the difference sample belongs to according to the synthesis paternity index Parent child relationship;
The matched purpose SNP site refers to the identical purpose SNP site of different samples at least one allele.
7. personal status relationship identification method as described in any one of claims 1 to 4, which is characterized in that the step S4 includes:
Step S43:Judge whether to be the kinship identification for carrying out different samples other non-parent child relationships, if so, according to The number of unmatched purpose SNP site analyzes the kinship of other non-parent child relationships;
The unmatched purpose SNP site refers to the purpose SNP site that two allele of different samples are different from.
8. a kind of personal status relationship identification apparatus, which is characterized in that including:
Abrupt information acquisition module, for obtaining the sample abrupt information that sequencing result is compared;
Purpose SNP information searching modules, in the mutational site of the sample abrupt information to multiple purpose SNP sites by One is retrieved, and mutation and the sequencing information of the genotype comprising each purpose SNP site are obtained;
Identification information selecting module, for selected from multiple purpose SNP sites meet preset requirement purpose SNP site and its Mutation and sequencing information, obtain identification information;And
Personal status relationship identifies module, right for the genotype of corresponding purpose SNP site in the identification information of more different samples The personal status relationship of different samples is identified.
9. a kind of computer equipment, which is characterized in that have processor and memory, the memory to be stored with computer journey Sequence, the processor realize that personal status relationship according to any one of claims 1 to 7 such as is identified when executing the computer program The step of method.
10. a kind of computer storage media, is stored thereon with computer program, which is characterized in that the computer program is held It is realized such as the step of personal status relationship identification method according to any one of claims 1 to 7 when row.
CN201810490416.4A 2018-05-21 2018-05-21 Identity relationship identification method, device, equipment and storage medium Active CN108647495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810490416.4A CN108647495B (en) 2018-05-21 2018-05-21 Identity relationship identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810490416.4A CN108647495B (en) 2018-05-21 2018-05-21 Identity relationship identification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108647495A true CN108647495A (en) 2018-10-12
CN108647495B CN108647495B (en) 2020-04-10

Family

ID=63757290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810490416.4A Active CN108647495B (en) 2018-05-21 2018-05-21 Identity relationship identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108647495B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115346594A (en) * 2022-08-24 2022-11-15 温州医科大学 Grandfather-grandfather relationship identification method, system, equipment and medium without participation of mother and mother
EP3938536A4 (en) * 2019-03-12 2023-03-08 Crown Bioscience (Suzhou) Inc. Methods and compositions for identification of tumor models
WO2023219214A1 (en) * 2022-05-12 2023-11-16 Republic Of Korea(National Forensic Service Director Ministry Of Interior And Safety) Snps panel for kinship identification in korean and use thereof
CN117423382A (en) * 2023-10-21 2024-01-19 云准医药科技(广州)有限公司 Single-cell barcode identity recognition method based on SNP polymorphism

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539967A (en) * 2008-12-12 2009-09-23 深圳华大基因研究院 Method for detecting mononucleotide polymorphism
WO2016049993A1 (en) * 2014-09-30 2016-04-07 深圳华大基因科技有限公司 Method and system for testing identity relations among multiple biological samples
CN107217095A (en) * 2017-06-15 2017-09-29 广东腾飞基因科技股份有限公司 The mankind's paternity identification multiple PCR primer group and detection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539967A (en) * 2008-12-12 2009-09-23 深圳华大基因研究院 Method for detecting mononucleotide polymorphism
WO2016049993A1 (en) * 2014-09-30 2016-04-07 深圳华大基因科技有限公司 Method and system for testing identity relations among multiple biological samples
CN106715712A (en) * 2014-09-30 2017-05-24 深圳华大基因科技有限公司 Method and system for testing identity relations among multiple biological samples
CN107217095A (en) * 2017-06-15 2017-09-29 广东腾飞基因科技股份有限公司 The mankind's paternity identification multiple PCR primer group and detection method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3938536A4 (en) * 2019-03-12 2023-03-08 Crown Bioscience (Suzhou) Inc. Methods and compositions for identification of tumor models
WO2023219214A1 (en) * 2022-05-12 2023-11-16 Republic Of Korea(National Forensic Service Director Ministry Of Interior And Safety) Snps panel for kinship identification in korean and use thereof
CN115346594A (en) * 2022-08-24 2022-11-15 温州医科大学 Grandfather-grandfather relationship identification method, system, equipment and medium without participation of mother and mother
CN115346594B (en) * 2022-08-24 2023-09-05 温州医科大学 Ancestor relationship identification method, system, equipment and medium without raw mother participation
CN117423382A (en) * 2023-10-21 2024-01-19 云准医药科技(广州)有限公司 Single-cell barcode identity recognition method based on SNP polymorphism

Also Published As

Publication number Publication date
CN108647495B (en) 2020-04-10

Similar Documents

Publication Publication Date Title
US7107155B2 (en) Methods for the identification of genetic features for complex genetics classifiers
US10354747B1 (en) Deep learning analysis pipeline for next generation sequencing
CN108647495A (en) Personal status relationship identification method, device, equipment and storage medium
US11164655B2 (en) Systems and methods for predicting homologous recombination deficiency status of a specimen
US10496679B2 (en) Computer algorithm for automatic allele determination from fluorometer genotyping device
CN106021984A (en) Whole-exome sequencing data analysis system
Larsson et al. Comparative microarray analysis
KR20230045009A (en) How to identify chromosomal spatial instability such as homology repair deficiency in low-coverage next-generation sequencing data
CN111968701A (en) Method and device for detecting somatic copy number variation of designated genome region
CN109913549B (en) Glioma molecular typing and application based on CDC20 gene co-expression network
US20220277811A1 (en) Detecting False Positive Variant Calls In Next-Generation Sequencing
US20050149271A1 (en) Methods and apparatus for complex gentics classification based on correspondence anlysis and linear/quadratic analysis
CN108694304A (en) A kind of personal status relationship identification method, device, equipment and storage medium
WO2021062198A1 (en) Single cell rna-seq data processing
Lin et al. Controlling for confounding factors and revealing their interactions in genetic association meta-analyses: a computing method and application for stratification analyses
TWI764817B (en) Genome risk reading method
CN114093417B (en) Method and device for identifying chromosomal arm heterozygosity loss
Denti Algorithms for analyzing genetic variability from Next-Generation Sequencing data
Barrett et al. Statistical perspectives for genome-wide association studies (GWAS)
McDonald Lodgepole pine linkage map reveals patterns of genomic clustering of locally adaptive loci
Hu et al. Integrated variant allele frequency analysis pipeline and R package: easyVAF
Aloqaily et al. Feature prioritisation on big genomic data for analysing gene-gene interactions
CN116525101A (en) System for auxiliary diagnosis of Asian human brain glioma
CN116855596A (en) Rice variety homogeneity evaluation method
Chlis et al. Extracting reliable gene expression signatures through stable bootstrap validation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181012

Assignee: Zhengzhou Jinyu Clinical Laboratory Center Co.,Ltd.

Assignor: GUANGZHOU KINGMED DIAGNOSTICS GROUP Co.,Ltd.

Contract record no.: X2021980010019

Denomination of invention: Identification method, device, equipment and storage medium

Granted publication date: 20200410

License type: Common License

Record date: 20210928

EC01 Cancellation of recordation of patent licensing contract
EC01 Cancellation of recordation of patent licensing contract

Assignee: Zhengzhou Jinyu Clinical Laboratory Center Co.,Ltd.

Assignor: GUANGZHOU KINGMED DIAGNOSTICS GROUP Co.,Ltd.

Contract record no.: X2021980010019

Date of cancellation: 20220922

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181012

Assignee: Zhengzhou Jinyu Clinical Laboratory Center Co.,Ltd.

Assignor: GUANGZHOU KINGMED DIAGNOSTICS GROUP Co.,Ltd.

Contract record no.: X2022980016522

Denomination of invention: Identification method, device, equipment and storage medium

Granted publication date: 20200410

License type: Common License

Record date: 20220927