CN108694304A - A kind of personal status relationship identification method, device, equipment and storage medium - Google Patents

A kind of personal status relationship identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN108694304A
CN108694304A CN201810490417.9A CN201810490417A CN108694304A CN 108694304 A CN108694304 A CN 108694304A CN 201810490417 A CN201810490417 A CN 201810490417A CN 108694304 A CN108694304 A CN 108694304A
Authority
CN
China
Prior art keywords
snp
site
snp site
information
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810490417.9A
Other languages
Chinese (zh)
Other versions
CN108694304B (en
Inventor
刘晶星
陈白雪
郭周萍
严慧
赵薇薇
于世辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kingmed Diagnostics Group Co ltd
Guangzhou Kingmed Diagnostics Central Co Ltd
Original Assignee
Guangzhou Kingmed Diagnostics Group Co ltd
Guangzhou Kingmed Diagnostics Central Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kingmed Diagnostics Group Co ltd, Guangzhou Kingmed Diagnostics Central Co Ltd filed Critical Guangzhou Kingmed Diagnostics Group Co ltd
Priority to CN201810490417.9A priority Critical patent/CN108694304B/en
Publication of CN108694304A publication Critical patent/CN108694304A/en
Application granted granted Critical
Publication of CN108694304B publication Critical patent/CN108694304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Abstract

The present invention relates to a kind of personal status relationship identification method, device, equipment and the storage mediums of the validity and reliability that can improve qualification result.The personal status relationship identification method is when personal status relationship is identified, by being retrieved one by one to multiple purpose SNP sites in the mutational site of sample abrupt information, obtain mutation and the sequencing information of the genotype comprising each purpose SNP site, and the purpose SNP site for meeting preset requirement and its mutation and sequencing information are selected from multiple purpose SNP sites, obtain identification information, finally in the identification information of more different samples corresponding purpose SNP site genotype, the personal status relationship of different samples is identified.Since the mutation rate that SNP is mutated to other directions is extremely low, even if limited if influences of the single purpose SNP of mutation to final result, therefore, the identification that personal status relationship is carried out by SNP can significantly improve the validity and reliability of qualification result compared with traditional method detected using STR.

Description

A kind of personal status relationship identification method, device, equipment and storage medium
Technical field
The present invention relates to molecular biology and bioinformatics technique fields, more particularly, to a kind of personal status relationship identification side Method, device, equipment and storage medium.
Background technology
The method of the personal status relationships such as individual identification and kinship identification (including paternity test) identification is based primarily upon at present STR (short tandem repeat, short tandem repeat) is detected.The study found that in human genome STR quantity phase It is considerably less for SNP (Single Nucleotide Polymorphism, single nucleotide polymorphism), there is height for individual differentiation The STR of discrimination is less, then excludes the situation that STR frequency fluctuations are larger in different crowd, and remaining available STR just has very much Limit.And single STR is easier to mutate relative to SNP etc., and due to available STR limited amounts, even if only occurring one A mutation also can generate large effect to final qualification result.
Invention content
Based on this, it is necessary in view of the above technical problems, provide a kind of validity that can improve qualification result and reliable Personal status relationship identification method, device, equipment and the storage medium of property.
A kind of personal status relationship identification method, includes the following steps:
Step S1:Obtain the sample abrupt information that sequencing result is compared;
Step S2:Multiple purpose SNP sites are retrieved one by one in the mutational site of the sample abrupt information, are obtained Mutation to the genotype comprising each purpose SNP site and sequencing information;
The multiple purpose SNP site is selected from following 984 SNP sites:
Step S3:The purpose SNP site for meeting preset requirement and its mutation and sequencing are selected from multiple purpose SNP sites Information obtains identification information;
Step S4:The genotype for comparing corresponding purpose SNP site in the identification information of different samples, to different samples Personal status relationship is identified.
A kind of personal status relationship identification apparatus, including:
Abrupt information acquisition module, for obtaining the sample abrupt information that sequencing result is compared;
Purpose SNP information searching modules are used in the mutational site of the sample abrupt information to multiple purpose SNP Point is retrieved one by one, obtains mutation and the sequencing information of the genotype comprising each purpose SNP site;The multiple purpose SNP It selects from following 984 SNP sites in site:
Identification information selecting module, for selecting the purpose SNP site for meeting preset requirement from multiple purpose SNP sites And its mutation and sequencing information, obtain identification information;And
Personal status relationship identifies module, the gene for corresponding purpose SNP site in the identification information of more different samples Type identifies the personal status relationship of different samples.
There is a kind of computer equipment processor and memory, the memory to be stored with computer program, the processing Device realizes the step of personal status relationship identification method described in any of the above-described embodiment when executing the computer program.
A kind of computer storage media, is stored thereon with computer program, and the computer program is performed in realization The step of stating the personal status relationship identification method described in any embodiment.
Above-mentioned personal status relationship identification method, device, equipment and storage medium, when personal status relationship is identified, by sample Multiple purpose SNP sites are retrieved one by one in the mutational site of abrupt information, obtain the gene for including each purpose SNP site The mutation of type and sequencing information, and purpose SNP site and its mutation for meeting preset requirement are selected from multiple purpose SNP sites And sequencing information, obtain identification information, finally in the identification information of more different samples corresponding purpose SNP site gene Type identifies the personal status relationship of different samples.It is single even if mutation since the mutation rate that SNP is mutated to other directions is extremely low Influences of a purpose SNP to final result is also limited, therefore, the identification of personal status relationship is carried out by SNP, made compared with traditional The method detected with STR, can significantly improve the validity and reliability of qualification result.
It is further, extremely low to mispairing tolerance the study found that since individual identification and paternity test are complete matched identifications, So individual identification or paternity test can generally reach relatively good identification result using 20 STR, but for other parents Category relationship is identified, since not all site all matches, may result in larger random error in identification in this way.Such as Monoploid has 50% not homologous site between grandparent and grandchild, and it is random crowd averagely to have 10 matching results in 20 STR at this time Matching result, such number of sites causes the number of sites random fluctuation of final actual match bigger very little, to identification result It is excessively poor.And quantity of the SNP in human genome it is very huge (thousand Human Genome Programs report human polymorphism SNP reach To 80,000,000, it is average everyone there are about 350-400 ten thousand), can be provided more preferably for the identification of all kinds of personal status relationships using SNP Support.Paternity test is can be not only used for using SNP, can be also used for individual identification and its in addition to parenthood determination He identifies that error is small at kinship, and reliability is high.
Further, the detection method of traditional STR is DNA fragmentation analysis, is not conventional DNA sequencing method, and STR It is located at intergenic region, the nonfunctional region being much all presently considered to be mostly, general sequencing project all will not relate to these areas Domain, thus in these sequencing projects if encounter need identify personal status relationship when, generally require additionally plus do an inspection The experiment of STR is surveyed, it is time-consuming and laborious, and the raising of project cost can be caused.And gene extron and other functional non-codings Just there are a large amount of enough SNP on region, thus further these can be utilized in most scientific research clinic sequencing projects In the SNP that has all measured carry out personal status relationship identification, can identify all kinds of personal status relationships without additional experiment.Thus, it uses Above-mentioned personal status relationship identification method is time saving, and can reduce testing cost.
Description of the drawings
Fig. 1 is the personal status relationship identification method flow diagram of an embodiment;
Fig. 2 is the flow diagram of specific example when being identified the personal status relationship of different samples in Fig. 1;
Fig. 3 is the structural schematic diagram of the personal status relationship identification apparatus of an embodiment;
Fig. 4 is the structural schematic diagram for the specific example that personal status relationship identifies module in Fig. 3.
Specific implementation mode
To facilitate the understanding of the present invention, below with reference to relevant drawings to invention is more fully described.In attached drawing Give presently preferred embodiments of the present invention.But the present invention can realize in many different forms, however it is not limited to this paper institutes The embodiment of description.Keep the understanding to the disclosure more thorough on the contrary, purpose of providing these embodiments is Comprehensively.
Unless otherwise defined, all of technologies and scientific terms used here by the article and belong to the technical field of the present invention The normally understood meaning of technical staff is identical.Used term is intended merely to description tool in the description of the invention herein The purpose of the embodiment of body, it is not intended that in the limitation present invention.Term as used herein "and/or" includes one or more phases Any and all combinations of the Listed Items of pass.
" personal status relationship identification " as described herein includes individual identification, parenthood determination and other non-parent child relationships Kinship is identified, such as the identification of grandfather grandson's relationship, the identification of uncle and nephew relationship, the identification of siblings' relationship, the identification of cousin's relationship, nephew uncle Relationship identification etc.;" the crowd's frequency of mutation " of the SNP site refers to one in a specific crowd (such as Chinese group) The frequency of occurrences of a SNP site and that inconsistent base of reference sequences;" allele crowd's frequency of the SNP site Rate " refers to the frequency of occurrences of each allele of a SNP site in a specific crowd (such as Chinese group);It is described " mutation quality " refer to the given acquiescence quality control standards of GATK (or other mutation analysis softwares);" read " refers to height Sequencing sequence caused by flux microarray dataset (such as all kinds of two generations microarray datasets);" the sequencing coverage " refers to a survey The read numbers of tagmeme point covering.
As shown in Figure 1, one embodiment of the invention provides a kind of personal status relationship identification method comprising following steps:
Step S110:Obtain the sample abrupt information that sequencing result is compared.
To each sample, the method that can be used but not limited to the sequencing of two generations is sequenced, and sequencing result is obtained. It to after sequencing result, can be compared in the reference gene group of the mankind, mutation file can be obtained by analysis, contained The sample abrupt information of the sample.The sample abrupt information includes the information such as mutational site, the frequency of mutation, mutation quality.Institute It is for reference gene group to state mutation, i.e., sequencing result, which is shown, has and corresponding region or site in reference gene group The different variation of sequence.
Step S120:Multiple purpose SNP sites are retrieved one by one in the mutational site of sample abrupt information, are obtained Include mutation and the sequencing information of the genotype of each purpose SNP site.
Each purpose SNP site is preferably placed on autosome exon or on the domain of functional non-coding regions, and allele Crowd's frequency is between 0.45~0.55.Parent child relationship (father and son, mothers and sons) are identified, the purpose retrieved is generally required The quantity of SNP site can reach about 99.999% accuracy at 100, and 960 sites can reach about (100-10-53) % accuracy, therefore, for parenthood determination, the number of purpose SNP site can require to be not less than 100;For it He kinship identifies that such as later analysis is all according to the desired value for mismatching number of sites regardless of how many a purpose SNP sites Supposition analysis is carried out, although 100% cannot conclude that confidence level is still very high, general purpose SNP site number is more, knot The reliability of fruit is bigger, for example, general purpose SNP site number is not less than 720 relatives that can be carried out this level-one of cousin Relationship identifies that purpose SNP site can carry out the kinship identification of this level-one of grandfather grandson/uncle and nephew, purpose SNP not less than 480 Site can carry out the kinship identification of this level-one of siblings not less than 240;It is purpose SNP for individual identification The detection that the genotype of point exactly matches, the number of general purpose SNP site can require to be not less than 50.
In a specific example, can be selected it is being located in Chinese population as shown in table 1 below on often dyeing exon and Multiple purpose SNP sites in 984 purpose SNP sites of the allele crowd frequency between 0.45~0.55, these Purpose SNP site is contained in the project of most gene exon sequencing.
Table 1
Note:The above SNP site reference sequences are hg19.With " 10|101293035|C|The purpose SNP site that A " is indicated For, wherein " |" as project symbol is separated, " 10 " indicate that chromosome number, " 101293035 " indicate on corresponding chromosome Coordinate position, " C " indicates that the base consistent with corresponding site in reference gene group, " A " expression are corresponding in reference gene group Another inconsistent base of site;Other purposes SNP site is similarly.
When being retrieved one by one to multiple purpose SNP sites in the mutational site of sample abrupt information, it may retrieve The current purpose SNP, it is also possible to which retrieval is less than current purpose SNP.For that can be examined in the mutational site of sample abrupt information The purpose SNP that rope arrives then illustrates that the genotype of the purpose SNP site is the homozygous or heterozygosis inconsistent with reference gene type Type, the mutation of acquisition and sequencing information include the genotype of the purpose SNP site, allele crowd frequency, mutation quality with And sequencing coverage;For the purpose SNP for failing to retrieve in the mutational site of sample abrupt information, then illustrate the purpose The genotype of SNP site is consistent with reference gene type homozygous, and the mutation of acquisition and sequencing information include purpose SNP Genotype, allele crowd frequency and the sequencing coverage of point.It is sequenced the information such as coverage, it can be according to currently retrieving Purpose SNP site situation is compared in file (such as bam files) from the sequencing of sample and is calculated.
It is denoted as R with the allele of reference, the allele of mutation is denoted as V, and the mankind are diploids, for described The purpose SNP that can be retrieved in the mutational site of sample abrupt information illustrates the gene of the current purpose SNP site of the sample Type is VV (homozygous) or RV (heterozygous), for the purpose for failing to retrieve in the mutational site of the sample abrupt information SNP illustrates that the genotype of the current purpose SNP site of the sample is RR.
Step S130:The purpose SNP site for meeting preset requirement and its mutation and survey are selected from multiple purpose SNP sites Sequence information, obtains identification information.
Specifically, it refers to that sequencing coverage meets more than 30 read coverings to meet preset requirement, and mutation quality meets The acquiescence quality control standard of GATK.
Acquiescence quality control standard, that is, QD&gt of GATK;2.0 and MQ>40.0 and FS<60.0 and HaplotypeScore<60.0 and MQRankSum>- 12.5 and ReadPosRankSum>-8.0.
Fail-safe analysis is carried out by multiple purpose SNP sites, the shared high quality site of sample can be filtered out, avoided The influence that insecure site judges result.Having enough coverage and matter is required to the parting of a purpose SNP site Amount control, otherwise probably because randomness and misclassification, such as certain purpose SNP site father is AA types, and son is AT types, but such as The coverage in the fruit son site is very low or poor quality, such as only 5 read, it is possible to which this 5 read are just T, or there is poor quality's to lead to not measure A, finally the parting of son will be judged as TT.
Step S140:The genotype for comparing corresponding purpose SNP site in the identification information of different samples, to different samples Personal status relationship identified.
The genotype for summarizing each purpose SNP site for meeting preset requirement selected, obtains identification information, can generate such as The identification information file of utag formats.
The identification information file may be used for individual identification, kinship identification etc..
By taking parent child relationship as an example, the parentage exclusion probability PE=2*p of single purpose SNP site2*(1-p)2, p is purpose SNP Allele crowd's frequency in site, PE values can obtain maximum value 0.125 in p=0.5.In p between 0.45~0.55 When, the parentage exclusion probability minimum 0.1225125 of the single purpose SNP site.For 984 purpose SNP sites, this hair is used The parentage exclusion probability that bright method obtainsFar above 20 traditional STR bits The paternity test method of point.
In one specifically example, as shown in Fig. 2, step S140 includes:
Step S141:Judgement is to carry out individual identification or kinship identification to different samples, if carrying out individual knowledge Not, S142 is thened follow the steps, it is no to then follow the steps S143.
Step S142:The then genotype of all corresponding purpose SNP sites of more different samples, according to comparison result point Analyse whether the difference sample belongs to same individual.
In principle, for individual identification, need the genotype of all corresponding purpose SNP sites is completely the same can just sentence Disconnected is same individual, but when a large amount of purpose SNP sites are compared and analyzed, the genotype for minute quantity purpose SNP site occur differs In the case of cause, can also optionally make a concrete analysis of, as degradation occurs for sample DNA, by inspection individual certain SNP in embryonic differentiation process Middle mutation.Such as the mutation generated during embryonic differentiation, the gene at a human body different tissues position can be made to have small Difference, individual identification may be derived from the sample of different parts, although this possibility is very low, still remain, but general The judgement of individual identification is not influenced.
Step S143:Count in the corresponding purpose SNP site of different samples matched purpose SNP site number and/or Unmatched purpose SNP site number.
The matched purpose SNP site refers to the identical purpose SNP site of different samples at least one allele. The unmatched purpose SNP site refers to the purpose SNP site that two allele of different samples are different from.Described The sum of the number for the purpose SNP site matched and the unmatched purpose SNP site number are equal to general purpose in identification information The number of SNP site.
Step S144:Judgement is to carry out parenthood determination or other relatives in addition to parent child relationship to different samples Relationship is identified, if parenthood determination, thens follow the steps S145, no to then follow the steps S146.
Step S145:It is calculated according to the genotype of purpose SNP site and corresponding allele crowd frequency each matched The paternity index of purpose SNP site, and comprehensive paternity index, root are determined by the paternity index of each matched purpose SNP site Analyze whether the difference sample belongs to parent child relationship according to the synthesis paternity index.
The paternity index PI of each matched purpose SNP site is calculated according to following formula:piFor matching Gene frequency, PI takes the sum of all the case where capable of matching.Comprehensive paternity index CPI is the product of all PI values.
Can analyze whether the difference sample belongs to parent child relationship according to synthesis paternity index CPI, such as in general, It can be judged to being parent child relationship when CPI > 1000.
Step S146:The kinship of other non-parent child relationships is analyzed according to the number of unmatched purpose SNP site.
It is understood that in other embodiments, step S140 can be only to individual identification, parenthood determination and except parent A kind of personal status relationship in the identification of other kinships except subrelation identification is identified or two kinds of personal status relationships identifications, accordingly Ground, such as in a specific example, step S140 includes:Judge whether be to different samples carry out individual identification, if so, The genotype for comparing all corresponding purpose SNP sites of different samples analyzes whether the difference sample belongs to according to comparison result In same individual;For another example in another specific example, step S140 includes:Judge whether it is that parent-offspring is carried out to different samples Relationship is identified, if so, calculating each matched mesh according to the genotype of purpose SNP site and corresponding allele crowd frequency SNP site paternity index, and comprehensive paternity index is determined by the paternity index of each matched purpose SNP site, according to The synthesis paternity index analyzes whether the difference sample belongs to parent child relationship, and matched purpose SNP site refers to different samples The identical purpose SNP site of at least one allele;For another example, in another specific example, step S140 includes:Sentence Whether disconnected be the kinship identification that different samples are carried out with other non-parent child relationships, if so, according to unmatched purpose SNP The number in site analyzes the kinship of other non-parent child relationships, and unmatched purpose SNP site refers to the two of different samples The purpose SNP site that a allele is different from.
More specifically, in one example, such as SNP site total number/16 of unmatched purpose SNP site number ≈ mesh It may be considered grandfather grandson or uncle and nephew etc., SNP site total number/32 of unmatched purpose SNP site number ≈ mesh can be with Think siblings.
Here, introducing concept:The uncorrelated site on monoploid level between two samples, the i.e. not no position of genetic association Point.Only uncorrelated site is possible to that the multiple purpose SNP of two samples can be caused to mismatch, and is 0.5 for crowd's frequency The ratio of SNP, three kinds of frequency of genotypes AA/BB/AB are respectively 0.25,0.25,0.5, when two samples are AA and BB SNP can be caused to mismatch, this probability is 2*0.25*0.25=0.125, i.e., 1/8, i.e., the maximum non-father of single SNP site excludes Rate.
Below with the corresponding purpose SNP sums 960 of two samples, allele crowd's frequency of all purposes SNP site It is to demonstrate the unmatched purpose SNP site number under different kinships for 0.5:
1. son has item chromosome heredity completely from father, therefore the uncorrelated number of sites between father and son is 0;
2. exchange when in view of meiosis between non-sister chromatid generates genetic recombination, son's heredity is from father's That chromosome has 0.5 to be contemplated to be heredity from grandfather, so uncorrelated number of sites between grandparent and grandchild is 960*0.5=480;
3. similarly, son's heredity has 0.5 to be contemplated to be heredity from grandmother from that chromosome of father, this part is contaminated Colour solid grandfather and grandmother respectively have 50% possibility to entail uncle, i.e., what uncle possessed this chromosome dyad is desired for 0.5*50% + 0.5*50%=0.5, so uncorrelated number of sites is 960* (1-0.5)=480 between uncle and nephew;
4. the case where siblings, it is desirable that two allele all separate sources are only uncorrelated site, and example is fatherlike For Aa, mother is Bb, then requires that entail elder brother is AB, entail younger brother be ab or Ab the combined crosswises such as aB; If there is the non-crossing combination that AB Ab etc. have public allele, purpose SNP is relevant matching site, generates intersection The probability of combination is 0.5*0.5=0.25, i.e., uncorrelated number of sites is 960*0.25=240 between siblings;
5. it is 0.5 to have the probability of identical source chromosome between uncle and nephew excessively calculated above, this chromosome dyad is lost by uncle The probability for being transmitted to cousin is 0.5, i.e. the probability of this chromosome dyad of cousin is 0.25, so uncorrelated number of sites is between cousin 960* (1-0.25)=720;
6. similarly, uncorrelated number of sites is 960* (1-0.125)=840 between nephew uncle.
It is the desired value of the uncorrelated number of sites of all kinds of kinships and unmatched purpose SNP site shown in the following table 2 The desired value of number.
Table 2
The allele crowd's frequency for being above all SNP sites of consideration is the desired result in the case of 0.5, practical In the case of cause elimination factor to reduce due to SNP allele crowds frequency departure 0.5, SNP number of unmatched purpose can be made to subtract It is few.
By example detection and the study found that for allele crowd's frequency between 0.45~0.55, be used as it is upper The multiple purpose SNP sites for stating the retrieval object in step S120, when other kinships such as final non-parent child relationship judge, The desired value that the unmatched purpose SNP site number of upper table 2 can be referred to carries out kinship judgement.
Above-mentioned personal status relationship identification method is when personal status relationship is identified, by the mutation for including cdna sample abrupt information In site, multiple purpose SNP are retrieved one by one, obtain mutation and the sequencing information of each purpose SNP, and according to each purpose The mutation of SNP and sequencing information judge whether the reliability of each purpose SNP meets preset requirement, select the mesh for meeting preset requirement SNP and its mutation and sequencing information, build identification information, finally the corresponding purpose in the identification information of more different samples SNP and its mutation and sequencing information are identified by the personal status relationship of identification information pair difference sample.Due to SNP to other The mutation rate of direction mutation is extremely low, even if limited if influences of the single purpose SNP of mutation to final result, therefore, passes through SNP The identification of personal status relationship is carried out, compared with traditional method detected using STR, the validity of qualification result can be significantly improved And reliability.
It is further, extremely low to mispairing tolerance the study found that since individual identification and paternity test are complete matched identifications, So individual identification or paternity test can generally reach relatively good identification result using 20 STR, but for other parents Category relationship is identified, since not all site all matches, may result in larger random error in identification in this way.Such as Monoploid has 50% not homologous site between grandparent and grandchild, and it is random crowd averagely to have 10 matching results in 20 STR at this time Matching result, such number of sites causes the number of sites random fluctuation of final actual match bigger very little, to identification result It is excessively poor.And quantity of the SNP in human genome it is very huge (thousand Human Genome Programs report human polymorphism SNP reach To 80,000,000, it is average everyone there are about 350-400 ten thousand), can be provided more preferably for the identification of all kinds of personal status relationships using SNP Support.Paternity test is can be not only used for using SNP, can be also used for individual identification and its in addition to parenthood determination He identifies that error is small at kinship, and reliability is high.
Further, the detection method of traditional STR is DNA fragmentation analysis, is not conventional DNA sequencing method, and STR It is located at intergenic region, the nonfunctional region being much all presently considered to be mostly, general sequencing project all will not relate to these areas Domain, thus in these sequencing projects if encounter need identify personal status relationship when, generally require additionally plus do an inspection The experiment of STR is surveyed, it is time-consuming and laborious, and the raising of project cost can be caused.And gene extron and other functional non-codings Just there are a large amount of enough SNP on region, thus further these can be utilized in most scientific research clinic sequencing projects In the SNP that has all measured carry out personal status relationship identification, can identify all kinds of personal status relationships without additional experiment.For example, clinical On be connected to the sequencing project of a genetic disease, after full exon sequencing analysis, clinical signs of suspected may be consanguineous marriage, Above-mentioned personal status relationship identification method can be used directly, identified using the SNP being sequenced, additional experiments are not necessarily to.Thus, make It is time saving with above-mentioned personal status relationship identification method, and testing cost can be reduced.
As shown in figure 3, based on thought same as mentioned above, one embodiment of the invention additionally provides a kind of personal status relationship Identification apparatus 200 comprising:
Abrupt information acquisition module 210, for obtaining the sample abrupt information that sequencing result is compared;
Purpose SNP information searching modules 220 are used in the mutational site of sample abrupt information to multiple purpose SNP Point is retrieved one by one, obtains mutation and the sequencing information of the genotype comprising each purpose SNP site;
Identification information selecting module 230, for selecting the purpose SNP for meeting preset requirement from multiple purpose SNP sites Site and its mutation and sequencing information, obtain identification information;And
Personal status relationship identifies module 240, the base for corresponding purpose SNP site in the identification information of more different samples Because of type, the personal status relationship of different samples is identified.
In one specifically example, personal status relationship identifies that module 240 includes the first judgment module 241, individual identification mould Block 242, match condition statistical module 243, the second judgment module 244, parenthood determination module 245 and other kinships Identify module 246.
First judgment module 241 is for judging it is to carry out individual identification or kinship identification to different samples.
Individual identification module 242 for more different samples all corresponding purpose SNP sites genotype, according to than Compared with interpretation of result, whether the difference sample belongs to same individual.
Matched purpose SNP in corresponding purpose SNP site of the match condition statistical module 243 for counting different samples Site number and/or unmatched purpose SNP site number.
Second judgment module 244 is for judging it is to carry out parenthood determination to different samples or in addition to parent child relationship Other kinships identification.
Parenthood determination module 245 is used for the genotype according to purpose SNP site and corresponding allele crowd frequency The paternity index of each matched purpose SNP site is calculated, and is determined by the paternity index of each matched purpose SNP site comprehensive Paternity index is closed, analyzes whether the difference sample belongs to parent child relationship according to the synthesis paternity index.
Other are non-for analyzing this according to the number of unmatched purpose SNP site for other kinships identification module 246 The kinship of parent child relationship.
Based on embodiment as described above, the present invention also provides a kind of computers can be used for personal status relationship identification to set It is standby, there is processor and memory, computer program is stored on memory, processor is realized when executing the computer program The step of stating the personal status relationship identification method of any embodiment.
It is that can pass through meter one of ordinary skill in the art will appreciate that realizing all or part of flow in the above method Calculation machine program is completed to instruct relevant hardware, and the program can be stored in a non-volatile computer-readable storage In medium, in the embodiment of the present invention, which can be stored in the storage medium of computer system, and by the computer system At least one of processor execute, to realize including flow such as the embodiment of above-mentioned each method.Wherein, the storage is situated between Matter can be magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
Accordingly, the present invention also provides it is a kind of can be used for be sequenced build library primer sequence processing computer storage media, It is stored thereon with computer program, computer program is performed the personal status relationship identification method for realizing any of the above-described embodiment Step.
Each technical characteristic of embodiment described above can be combined arbitrarily, to keep description succinct, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, it is all considered to be the range of this specification record.
Several embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Range.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (10)

1. a kind of personal status relationship identification method, which is characterized in that include the following steps:
Step S1:Obtain the sample abrupt information that sequencing result is compared;
Step S2:Multiple purpose SNP sites are retrieved one by one in the mutational site of the sample abrupt information, are wrapped The mutation of genotype containing each purpose SNP site and sequencing information;
The multiple purpose SNP site is selected from following 984 SNP sites:
Step S3:The purpose SNP site for meeting preset requirement and its mutation and sequencing letter are selected from multiple purpose SNP sites Breath, obtains identification information;
Step S4:The genotype for comparing corresponding purpose SNP site in the identification information of different samples, to the identity of different samples Relationship is identified.
2. personal status relationship identification method as described in claim 1, which is characterized in that be from 984 SNP for individual identification At least 50 SNP sites as a purpose are selected in site;
It is that at least 100 SNP sites as a purpose are selected from 984 SNP sites for parenthood determination;
It is that at least 480 are selected from 984 SNP sites for other kinships identification in addition to parenthood determination SNP site is to carry out the kinship identification of this level-one of grandfather grandson or uncle and nephew as a purpose, or is elected to from 984 SNP sites Few 720 SNP site is identified with the kinship for carrying out cousin this level-one as a purpose, or is elected to from 984 SNP sites Few 240 as a purpose SNP site identified with the kinship for carrying out siblings this level-one.
3. personal status relationship identification method as described in claim 1, which is characterized in that in the step S2, for described The purpose SNP that can be retrieved in the mutational site of sample abrupt information then illustrates that the genotype of the purpose SNP site is and ginseng Examine the inconsistent homozygous or heterozygous of genotype, the mutation of acquisition and sequencing information include the purpose SNP site genotype, Allele crowd frequency, mutation quality and sequencing coverage;
For the purpose SNP for failing to retrieve in the mutational site of the sample abrupt information, then illustrate the purpose SNP site Genotype be consistent with reference gene type homozygous, the mutation of acquisition and sequencing information include the base of the purpose SNP site Because of type, allele crowd frequency and sequencing coverage.
4. personal status relationship identification method as claimed in claim 3, which is characterized in that in the step S3, meet default want Seeking Truth refers to sequencing coverage and meets not less than 30 read coverings, and mutation quality meets the acquiescence quality control standard of GATK.
5. personal status relationship identification method as described in any one of claims 1 to 4, which is characterized in that the step S4 includes:
Step S41:Judge whether it is that individual identification is carried out to different samples, if so, more different samples is all corresponding The genotype of purpose SNP site analyzes whether the difference sample belongs to same individual according to comparison result.
6. personal status relationship identification method as described in any one of claims 1 to 4, which is characterized in that the step S4 includes:
Step S42:Judge whether it is that parenthood determination is carried out to different samples, if so, according to the gene of purpose SNP site Type and corresponding allele crowd frequency calculate the paternity index of each matched purpose SNP site, and by each matched mesh The paternity index of SNP site determine comprehensive paternity index, analyze whether the difference sample belongs to according to the synthesis paternity index Parent child relationship;
The matched purpose SNP site refers to the identical purpose SNP site of different samples at least one allele.
7. personal status relationship identification method as described in any one of claims 1 to 4, which is characterized in that the step S4 includes:
Step S43:Judge whether to be the kinship identification for carrying out different samples other non-parent child relationships, if so, according to The number of unmatched purpose SNP site analyzes the kinship of other non-parent child relationships;
The unmatched purpose SNP site refers to the purpose SNP site that two allele of different samples are different from.
8. a kind of personal status relationship identification apparatus, which is characterized in that including:
Abrupt information acquisition module, for obtaining the sample abrupt information that sequencing result is compared;
Purpose SNP information searching modules, in the mutational site of the sample abrupt information to multiple purpose SNP sites by One is retrieved, and mutation and the sequencing information of the genotype comprising each purpose SNP site are obtained;The multiple purpose SNP site It selects from following 984 SNP sites:
Identification information selecting module, for selected from multiple purpose SNP sites meet preset requirement purpose SNP site and its Mutation and sequencing information, obtain identification information;And
Personal status relationship identifies module, right for the genotype of corresponding purpose SNP site in the identification information of more different samples The personal status relationship of different samples is identified.
9. a kind of computer equipment, which is characterized in that have processor and memory, the memory to be stored with computer journey Sequence, the processor realize that personal status relationship according to any one of claims 1 to 7 such as is identified when executing the computer program The step of method.
10. a kind of computer storage media, is stored thereon with computer program, which is characterized in that the computer program is held It is realized such as the step of personal status relationship identification method according to any one of claims 1 to 7 when row.
CN201810490417.9A 2018-05-21 2018-05-21 Identity relationship identification method, device, equipment and storage medium Active CN108694304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810490417.9A CN108694304B (en) 2018-05-21 2018-05-21 Identity relationship identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810490417.9A CN108694304B (en) 2018-05-21 2018-05-21 Identity relationship identification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108694304A true CN108694304A (en) 2018-10-23
CN108694304B CN108694304B (en) 2020-03-24

Family

ID=63847606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810490417.9A Active CN108694304B (en) 2018-05-21 2018-05-21 Identity relationship identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108694304B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091869A (en) * 2020-01-13 2020-05-01 北京奇云诺德信息科技有限公司 Genetic relationship identification method using SNP as genetic marker
CN115572770A (en) * 2022-09-05 2023-01-06 上海蓝沙生物科技有限公司 Method for judging genetic relationship through SNP (single nucleotide polymorphism) mismatch rate

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539967A (en) * 2008-12-12 2009-09-23 深圳华大基因研究院 Method for detecting mononucleotide polymorphism
CN102978286A (en) * 2012-12-08 2013-03-20 上海迪道科技有限公司 Method for paternity test through utilizing specific single nucleotide polymorphism (SNP) combination
CN104619863A (en) * 2012-07-13 2015-05-13 生命技术公司 Human identifiation using a panel of SNPs
WO2016049878A1 (en) * 2014-09-30 2016-04-07 深圳华大基因科技有限公司 Snp profiling-based parentage testing method and application

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539967A (en) * 2008-12-12 2009-09-23 深圳华大基因研究院 Method for detecting mononucleotide polymorphism
CN104619863A (en) * 2012-07-13 2015-05-13 生命技术公司 Human identifiation using a panel of SNPs
CN102978286A (en) * 2012-12-08 2013-03-20 上海迪道科技有限公司 Method for paternity test through utilizing specific single nucleotide polymorphism (SNP) combination
WO2016049878A1 (en) * 2014-09-30 2016-04-07 深圳华大基因科技有限公司 Snp profiling-based parentage testing method and application

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091869A (en) * 2020-01-13 2020-05-01 北京奇云诺德信息科技有限公司 Genetic relationship identification method using SNP as genetic marker
CN115572770A (en) * 2022-09-05 2023-01-06 上海蓝沙生物科技有限公司 Method for judging genetic relationship through SNP (single nucleotide polymorphism) mismatch rate
CN115572770B (en) * 2022-09-05 2023-06-30 上海蓝沙生物科技有限公司 Method for judging genetic relationship through SNP mismatch rate

Also Published As

Publication number Publication date
CN108694304B (en) 2020-03-24

Similar Documents

Publication Publication Date Title
US7107155B2 (en) Methods for the identification of genetic features for complex genetics classifiers
CN108647495A (en) Personal status relationship identification method, device, equipment and storage medium
US11164655B2 (en) Systems and methods for predicting homologous recombination deficiency status of a specimen
US10496679B2 (en) Computer algorithm for automatic allele determination from fluorometer genotyping device
CN106021984A (en) Whole-exome sequencing data analysis system
Larsson et al. Comparative microarray analysis
KR20230045009A (en) How to identify chromosomal spatial instability such as homology repair deficiency in low-coverage next-generation sequencing data
CN109913549B (en) Glioma molecular typing and application based on CDC20 gene co-expression network
CN111968701A (en) Method and device for detecting somatic copy number variation of designated genome region
US20220277811A1 (en) Detecting False Positive Variant Calls In Next-Generation Sequencing
US20050149271A1 (en) Methods and apparatus for complex gentics classification based on correspondence anlysis and linear/quadratic analysis
CN108694304A (en) A kind of personal status relationship identification method, device, equipment and storage medium
EP4035163A1 (en) Single cell rna-seq data processing
Lin et al. Controlling for confounding factors and revealing their interactions in genetic association meta-analyses: a computing method and application for stratification analyses
TWI764817B (en) Genome risk reading method
CN114093417B (en) Method and device for identifying chromosomal arm heterozygosity loss
Barrett et al. Statistical perspectives for genome-wide association studies (GWAS)
Denti Algorithms for analyzing genetic variability from Next-Generation Sequencing data
Aloqaily et al. Feature prioritisation on big genomic data for analysing gene-gene interactions
McDonald Lodgepole pine linkage map reveals patterns of genomic clustering of locally adaptive loci
CN116525101A (en) System for auxiliary diagnosis of Asian human brain glioma
Chlis et al. Extracting reliable gene expression signatures through stable bootstrap validation
CN116855596A (en) Rice variety homogeneity evaluation method
CN117789822A (en) Biological individual geographic source positioning method based on multi-modal genetic information
Li Haplotype inference from pedigree data and population data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181023

Assignee: Zhengzhou Jinyu Clinical Laboratory Center Co.,Ltd.

Assignor: GUANGZHOU KINGMED DIAGNOSTICS GROUP Co.,Ltd.

Contract record no.: X2021980010019

Denomination of invention: An identity relationship identification method, device, equipment and storage medium

Granted publication date: 20200324

License type: Common License

Record date: 20210928

EE01 Entry into force of recordation of patent licensing contract
EC01 Cancellation of recordation of patent licensing contract

Assignee: Zhengzhou Jinyu Clinical Laboratory Center Co.,Ltd.

Assignor: GUANGZHOU KINGMED DIAGNOSTICS GROUP Co.,Ltd.

Contract record no.: X2021980010019

Date of cancellation: 20220922

EC01 Cancellation of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181023

Assignee: Zhengzhou Jinyu Clinical Laboratory Center Co.,Ltd.

Assignor: GUANGZHOU KINGMED DIAGNOSTICS GROUP Co.,Ltd.

Contract record no.: X2022980016522

Denomination of invention: An identification method, device, equipment and storage medium

Granted publication date: 20200324

License type: Common License

Record date: 20220927

EE01 Entry into force of recordation of patent licensing contract