CN103114150A - Single nucleotide polymorphism site identification method based on digestion library-establishing and sequencing and bayesian statistics - Google Patents

Single nucleotide polymorphism site identification method based on digestion library-establishing and sequencing and bayesian statistics Download PDF

Info

Publication number
CN103114150A
CN103114150A CN2013100775091A CN201310077509A CN103114150A CN 103114150 A CN103114150 A CN 103114150A CN 2013100775091 A CN2013100775091 A CN 2013100775091A CN 201310077509 A CN201310077509 A CN 201310077509A CN 103114150 A CN103114150 A CN 103114150A
Authority
CN
China
Prior art keywords
sequence
sequencing
enzyme
heap
snp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100775091A
Other languages
Chinese (zh)
Other versions
CN103114150B (en
Inventor
陶晔
钱刚
郑泽群
胡秋萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI MAJORBIO PHARM TECHNOLOGY Co Ltd
Original Assignee
SHANGHAI MAJORBIO PHARM TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI MAJORBIO PHARM TECHNOLOGY Co Ltd filed Critical SHANGHAI MAJORBIO PHARM TECHNOLOGY Co Ltd
Priority to CN201310077509.1A priority Critical patent/CN103114150B/en
Publication of CN103114150A publication Critical patent/CN103114150A/en
Application granted granted Critical
Publication of CN103114150B publication Critical patent/CN103114150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a single nucleotide polymorphism (SNP) site identification method based on digestion library-establishing and sequencing and bayesian statistics. The method is used for processing RAD (restriction site associated deoxyribonucleic acid) sequencing data, searching candidate SNP on an RAD sequencing fragment, and identifying the SNP reliability by employing a bioinformatics analysis method based on bayesian statistics. The method can be used for model and non-model organisms to eliminate the limitation that lots of species are lack of reference sequences and reduce the sequencing cost, and can be used for solving the bottleneck that a reliable statistical method is absent in the process of performing SNP identification by utilizing the RAD data at present, so that the obtained SNP site accuracy is greatly improved.

Description

Cut the method for the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme
Technical field
The present invention relates to a kind of method of cutting the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme.Be specially and cut based on enzyme single nucleotide polymorphism (SNP) site of building that the single end in storehouse (single-end) order-checking or two end (pair-end) order-checking obtain and carry out a kind of special Bayesian statistics check, thereby accurately identify the genotypic method of SNP; Can be in the situation that lack with reference to genome sequence, for the SNP check provides the reliable statistics meaning.The method belongs to the bioinformatics technique field.This has great importance for the research of the non-model animals that lacks reference sequences and the study on accuracy of genotype identification.
Background technology
SNP (Single Nucleotide Polymorphisms) single nucleotide polymorphism refers to the variation of single core thuja acid on genome, and its quantity is a lot, and general every 1000 bases just have a SNP, rich polymorphism in human genome.SNP is the ideal mark of carrying out all kinds of molecular biology researches, as building genetic map, gene type, molecular mark, disease forecasting, the fields such as medication guide.
Nowadays, s-generation DNA sequencing technology is a kind of high-throughput sequencing technologies cheaply, and ultimate principle is order-checking while synthesizing.Take the solexa sequence measurement as example, first with physical method, the DNA chain is interrupted at random, then add given joint at fragment two ends, the amplimer sequence is arranged on joint.During order-checking, the complementary strand of the synthetic fragment to be measured of archaeal dna polymerase reads base sequence by detecting the new synthetic entrained fluorescent signal of base, thereby obtains the sequence of fragment to be measured.
S-generation sequencing technologies has been widely used in many fields of bio-science, particularly studies a polymorphism between the species Different Individual.The method that tradition is sought the SNP mark is that individuality is checked order, and obtains short reads, then by short sequence alignment software, these short reads is compared back reference sequences, thereby obtains the individual SNP information that checks order.Common flow process has (general procedure as shown in Figure 1): use BWA software that reads is compared back reference sequences, use SAMtools software processes comparison result to seek SNP site [1,2]; Use SOAP software that reads is compared back reference sequences, use SOAPsnp software processes comparison result to seek SNP site [3,4].Can carry out very easily searching of SNP mark for the species that reference sequences is arranged, but for those non-model animalss, be not have reference sequences basically.And in the situation that there is no reference sequences, the method that tradition is sought the SNP mark exists technical bottleneck.
The RAD sequencing technologies has adopted the new storehouse mode of building (enzyme is cut and built the storehouse), its order-checking detailed process as shown in Figure 2, cut off the specific site of DNA with restriction enzyme, DNA molecular after with physical method, enzyme being cut again interrupts at random, select the DNA molecular of length-specific by agarose gel DNA isolation technique, then add specific amplification joint and sequence measuring joints at select DNA end, thereby structure upward carries out high-flux sequence in the machine library.
Wherein the RAD sequence measurement is method well known in the art, but reference [5,6,7,8] for example.Utilize the RAD sequencing technologies to identify that the SNP site achieves success in a lot of fields, but till the present invention occurred, method used was all generally utilize empirical value to screen and filter.For example in document [6] with two kinds of base depth scalings in the 0.25-0.75(low depth base degree of depth: the site the high depth base degree of depth) judges into the heterozygosis site, and ratio is in the site that becomes to isozygoty of the judgement below 0.1.This method does not have statistical significance, and the impact that is subject to simultaneously other extraneous factors is larger, as the order-checking total amount, identifies that the SNP genotype accuracy that obtains can't guarantee.Document [9] improves authentication method on the basis of empirical value method, use maximum likelihood method to carry out the correction of loci gene type, but its greatest problem is to determine the error rate in statistical method.
Document:
1.Li,H.and?R.Durbin,Fast?and?accurate?short?read?alignment?with?Burrows-Wheeler?transform.Bioinformatics,2009.25(14):p.1754-60.
2.Li,H.,et?al.,The?Sequence?Alignment/Map?format?and?SAMtools.Bioinformatics,2009.25(16):p.2078-9.
3.Li,R.,et?al.,SNP?detection?for?massively?parallel?whole-genome?resequencing.Genome?Res,2009.19(6):p.1124-32.
4.Li,R.,et?al.,SOAP:short?oligonucleotide?alignment?program.Bioinformatics,2008.24(5):p.713-4.
5.Houston,R.D.,et?al.,Characterisation?of?QTL-linked?and?genome-wide?restriction?site-associated?DNA(RAD)markers?in?farmed?Atlantic?salmon.BMC?Genomics,2012.13(1):p.244.
6.Scaglione,D.,et?al.,RAD?tag?sequencing?as?a?source?of?SNP?markers?in?Cynara?cardunculus?L.BMC?Genomics,2012.13(1):p.3.
7.Davey,J.W.,et?al.,Special?features?of?RAD?Sequencing?data:implications?for?genotyping.Mol?Ecol,2012.
8.Dasmahapatra,K.K.,et?al.,Butterfly?genome?reveals?promiscuous?exchange?of?mimicry?adaptations?among?species.Nature,2012.
9.Hohenlohe,P.A.,et?al.,Population?genomics?of?parallel?adaptation?in?threespine?stickleback?using?sequenced?RAD?tags.PLoS?Genet.6(2):p.e1000862.
Summary of the invention
The purpose of this invention is to provide a kind of method of cutting the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme; It is a kind ofly to build by processing to cut based on enzyme the sequencing data that storehouse order-checking (RAD sequencing technologies) obtains, and in individual or seek mononucleotide polymorphism site between individuality, and gives the technical scheme of statistical test.
Purpose of the present invention is achieved through the following technical solutions:
A kind of method of cutting the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme, its step is as follows:
1) after the sequencing result that obtains the RAD high throughput sequencing technologies, the RAD enzyme is cut the end sequencing sequence filter to remove underproof sequencing sequence.
Wherein, the RAD high throughput sequencing technologies can be Illumina GA sequencing technologies, also can be existing other high throughput sequencing technologies.
To be sequencing quality surpass 50% sequence of whole piece sequence base number and the sequence that there is no enzyme Qie Tezheng lower than the base number of predetermined inferior quality threshold value to described underproof sequencing sequence.
2) cut the sequencing sequence of an end according to order-checking genes of individuals group enzyme, utilize the full same sex of sequence to generate the information of each individual heap, and calculate each sequence heap order-checking depth information.For example, the enzyme after each individual filtration is cut the sequencing sequence information of an end as the key of Hash, the value of Hash is pointed to a chained list, is used for depositing the sequence information of the other end, and calculates the order-checking depth information.Available any programming language is realized this process.
3) the interior all sequences heap of body is compared in twos one by one, and heap is carried out cluster to determine intraindividual candidate's heterozygosis SNP site.
For 3) cluster result, only have the cluster result of a heap to show to cut at enzyme not have the heterozygosis site on an end order-checking fragment, only have the cluster result of two heaps to show to cut at enzyme to have the heterozygosis site on an end order-checking fragment.
4) all sequences heap in Different Individual is compared in twos, heap is carried out cluster to determine interindividual candidate SNP locus.
For 4) cluster result, only have the cluster result of a heap to show that there is not the SNP site in two individualities, have the cluster result of two heaps to show and have the SNP site between individuality.
Due to follow-up use Bayes statistical method, so each heap or each cluster result are not carried out depth type filtration herein, be one of advantage of the method, keep as far as possible more SNP site.
5) utilize Bayes statistical method that the depth information of range gene type on each candidate SNP locus is analyzed, identify the accuracy of candidate SNP.Owing to lacking the prior probability that on reference sequences and each site, various bases occur, so can't obtain the actual base that records in the error rate in each site.Therefore, use the exhaustive error rate that might occur of walking method in this Bayes statistical method, then select and make genotype exist the highest situation of probability as the genotype in this SNP site.Concrete formula and calculation procedure are as follows:
For each candidate SNP locus, can there be base possible in 4, i.e. any or multiple in " ATCG ", the base type of the definition frequency of occurrences the highest (degree of depth is maximum) is G1, and the corresponding degree of depth is N1, all the other bases of definition of successively decreasing successively, be genotype Gi(i=1,2,3,4), degree of depth Ni(i=1,2,3,4).Biologically, general species only two kinds of base types can occur on a SNP site, and for example sequencing data shows that A or T(N appear in this SNP site A〉=N T), this site must be to isozygoty or two kinds of genotype of heterozygosis so.Therefore this bayes method condition lower probability that only to detect above two kinds of genotype be ε in error rate is:
P ( N 1 , N 2 , N 3 , N 4 | G ii ϵ ) = N ! N 1 ! N 2 ! N 3 ! N 4 ! ( 1 - 0.75 ϵ ) Ni ( 0.25 ) N - Ni
P ( N 1 , N 2 , N 3 , N 4 | G ij ϵ ) N ! N 1 ! N 2 ! N 3 ! N 4 ! ( 0.5 - 0.25 ϵ ) ( Ni + Nj ) ( 0.25 ) N - Ni - Nj
N=N1+N2+N3+N4 wherein,
Figure BDA00002904603200054
The posterior probability of range gene type is:
Figure BDA00002904603200053
Because there is no sequence and early-stage Study data, error rate ε can't determined value, but document [10,11] report Illumina GA order-checking error rate is in 1% left and right.Set ε from 0.01%-5%, step pitch 0.01% herein.The final posterior probability of using is:
P(N ij) Final=max(P(N ij,ε))i,j∈{1,2,3,4},ε∈[0.01%,5%]。
If P is (N ij) FinalBe not less than 0.95, show that the genotype in this SNP site is ij, otherwise be defined as the data (missing data) that can't judge.
Technical scheme of the present invention has adopted the bioinformatic analysis method, process RAD(restriction-site associated DNA) sequencing data, seek the SNP site information on RAD order-checking fragment, utilize bayes method to identify the SNP genotype, the bottleneck that lacks reference sequences to break through non-model animals obtains result accurately when reducing costs.Introduce first Bayes statistical method when identifying the SNP loci gene type, compare with the method for empirical value before, statistical significance significantly improves, and accuracy is corresponding lifting also.
Document:
10.Li,Y.,et?al.,State?of?the?art?de?novo?assembly?of?human?genomes?from?massively?parallel?sequencing?data.Hum?Genomics,2010.4(4):p.271-7.
11.Xie,W.,et?al.,Parent-independent?genotyping?for?constructing?an?ultrahigh-density?linkage?map?based?on?population?sequencing.Proc?Natl?Acad?Sci?U?S?A.107(23):p.10578-83.
Description of drawings
Fig. 1 is the Principle of Process figure that tradition is sought SNP site method;
Fig. 2 is the order-checking detailed process schematic diagram of RAD sequencing technologies; In figure, (A) digestion with restriction enzyme genomic dna, and add the P1 joint, each P1 joint contains different sequence labels; (B), interrupt with the sample mix of different P1 joints together; (C) add top connection P2; (D) amplification enrichment RAD tags;
Fig. 3 is the illustration of RAD order-checking;
The cluster process figure that Fig. 4 makes a living in heaps uses the EcorI restriction enzyme in legend;
Fig. 5 cuts a terminal sequence information schematic diagram for enzyme in heap;
Fig. 6 seeks schematic flow sheet for SNP site in individual in heap and between individuality;
Fig. 7 is the illustration of candidate SNP base type and depth information, and 20 candidate SNP locus base type and depth information in 15 individualities respectively arranged in figure, and " C|9 " represents that this site C measures 9 times, and " C|9:T|3 " represents that this site C measures 9 times, T and measures 3 times.
Fig. 8 is the illustration of the genotype result in SNP site after Bayesian statistics, in figure, " a " and " b " represents respectively two kinds of different homozygous genotypes, " h " expression heterozygous genes type, for example " a " represents AA, and " b " represents CC, and " h " represents AC, x1 represents the posterior probability of isozygotying in " x1:x2:x3:x4 ", error rate values when x2 represents to isozygoty the posterior probability maximum, x3 represents the posterior probability of heterozygosis, the error rate values when x4 represents heterozygosis posterior probability maximum.
Fig. 9 is the deletion condition that utilizes data after empirical value method and Bayes statistical method.
Figure 10 is the statistic result of empirical value method and Bayes statistical method different loci.
Figure 11 is the result that the random choose empirical value method site different from Bayes statistical method utilizes the sanger sequence verification.
Embodiment
Further set forth technical characterstic of the present invention below in conjunction with accompanying drawing and specific embodiment.
As shown in Figure 2, what RAD order-checking was different from conventional high-flux sequence is to need to utilize restriction enzyme complete degestion genome before adding joint, then adds the special joint of RAD, after continue storehouse process and routine to build the storehouse identical.Fig. 3 is the illustration of RAD enzyme simple stage property end order-checking.Shown in Fig. 3 and used restriction enzyme Ecor1, the palindromic sequence of " G^AATTC " on the identification DNA molecular, and between G and A, DNA molecular is cut off, DNA molecular after enzyme is cut is broken into short sequence fragment with physical method, and add top connection at the DNA fragmentation two ends of containing enzyme simple stage property terminal sequence, single end (single-end) order-checking is also carried out in the PCR enrichment, and order-checking is read length and is generally 100nt, also can be 50nt.
Cut the method for the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme, its step is as follows:
1) after the sequencing result that obtains the RAD high throughput sequencing technologies, the two end sequencing sequences of RAD are filtered to remove underproof sequencing sequence.
Wherein, the RAD high throughput sequencing technologies can be Illumina GA sequencing technologies, also can be existing other high throughput sequencing technologies.
To be sequencing quality surpass 50% sequence of whole piece sequence base number and the sequence that there is no enzyme Qie Tezheng lower than the base number of predetermined inferior quality threshold value to described underproof sequencing sequence.
The inferior quality threshold value is decided by concrete sequencing technologies and order-checking environment, for example is set as single base sequencing quality lower than 20; The uncertain base of sequencing result in sequencing sequence (as the N in Illumina GA sequencing result) number surpasses 10% of whole piece sequencing sequence base number and thinks defective sequence; Except the sample joint sequence, compare with the exogenous array that other experiment is introduced, as the various terminal sequence.Think defective sequence if there is exogenous array in sequence; Cut in an end sequencing sequence at enzyme, do not filter out (as restriction enzyme Ecor1, sequencing sequence starts if not " AATTC " filters out whole sequencing sequence) if initial several bases are not enzyme simple stage property terminal sequences.
2) cut the sequencing sequence of an end according to order-checking genes of individuals group enzyme, utilize the full same sex of sequence to generate each individual information of piling.Detailed process as shown in Figure 4.The sequence information that the middle enzyme of heap (Stack) is cut an end can be preserved in the mode of Fig. 5, and in Fig. 5, what first row represented is the sequence information that enzyme is cut an end; What secondary series represented is that enzyme is cut the number of times that a terminal sequence is sequenced, and depth information namely checks order; The 3rd row are ID of this heap, are used for unique definite heap.
3) individual interior heap compares, if the situation in Fig. 6 does not appear in cluster, shows that individual inner these do not have heterozygosis SNP above sequence; If there is the situation in Fig. 6, show that there is heterozygosis SNP in individual inside.
When 4) between individuality, heap carries out comparing in twos, if there is Fig. 6 (a) situation, show to have the SNP that isozygotys between two individualities; If there is Fig. 6 (b) situation, show between two individualities to have heterozygosis SNP.
5) pile up class and relatively finish rear base type and the respective depth information that obtains each candidate SNP locus, representing that as Fig. 7 above-mentioned information gathers.
6) utilize Bayes statistical method that base type and the degree of depth are analyzed, as Fig. 8, obtain final SNP genotype result.
The embodiment data:
Bottle gourd F2 colony builds the genetic map project, comprises 139 F2 plant and two parents, and these 141 individualities are carried out the RAD-PE order-checking.(illustrate: the offspring that male parent and hybridization of female parent generate is F1, and the offspring that the F1 selfing generates is F2; Although use the RAD-PE order-checking, analyze and only use enzyme to cut terminal sequence)
Material source: Zhejiang Academy of Agricultural Science.
Embodiment concrete operations flow process:
The sequencing data that two parent RAD-PE order-checking is obtained, according to the sequencing quality value, the content of N, and whether contain enzyme simple stage property terminal sequence and filter, remove underproof sequencing sequence, the valid data statistics that obtains is as shown in table 1.
Table 1: bottle gourd RAD order-checking valid data statistics
Title Usage data amount (bp) Title Usage data amount (bp) Title Usage data amount (bp)
Male parent 585,377,540 F2-46 3,800,775 F2-93 1,556,104
Maternal 423,794,746 F2-47 2,522,407 F2-94 1,651,259
F2-1 3,114,771 F2-48 4,636,152 F2-95 3,213,147
F2-2 2,302,730 F2-49 3,737,623 F2-96 2,202,354
F2-3 537,822 F2-50 647,499 F2-97 1,956,440
F2-4 1,650,925 F2-51 3,678,334 F2-98 1,112,431
F2-5 2,824,708 F2-52 2,153,996 F2-99 1,086,168
F2-6 579,177 F2-53 7,029,889 F2-100 1,705,836
F2-7 2,805,093 F2-54 2,315,687 F2-101 2,311,919
F2-8 2,234,442 F2-55 4,116,520 F2-102 1,445,671
F2-9 1,814,510 F2-56 1,554,335 F2-103 5,292,536
F2-10 2,063,581 F2-57 1,912,949 F2-104 371,736
F2-11 437,393 F2-58 4,513,981 F2-105 5,528,190
F2-12 1,114,627 F2-59 4,660,158 F2-106 2,286,908
F2-13 292,168 F2-60 2,963,600 F2-107 2,977,378
[0073]?
F2-14 L981.379 F2-61 1.912.346 F2-108 1.113.885
F2-15 L710.808 F2-62 2.198.112 F2-109 2.358.577
F2-16 L666.317 F2-63 2.149.228 F2-110 2.014.988
F2-17 3.837.185 F2-64 2.907.400 F2-111 5.021.837
F2-18 2.705.794 F2-65 2565399 F2-112 L687.183
F2-19 L641.718 F2-66 1.802.757 F2-113 1.454.774
F2-20 4.181.837 F2-67 6.136.789 F2-114 L187.993
F2-21 2.167.926 F2-68 5.106.060 F2-115 917.204
F2-22 8.967 F2-69 5.492.357 F2-116 673.176
F2-23 L936.761 F2-70 4.925.717 F2-117 903.357
F2-24 4.907.028 F2-71 2.016.103 F2-118 1.252.469
F2-25 2.641.269 F2-72 4.495.767 F2-119 L066.660
F2-26 L344.809 F2-73 957.643 F2-120 83.426
F2-27 2.184.764 F2-74 3.193.347 F2-121 624.005
F2-28 2.312.351 F2-75 30.335 F2-122 4.246.910
F2-29 L318.322 F2-76 2.067.906 F2-123 824.013
F2-30 L830.247 F2-77 200.856 F2-124 3.322.863
F2-31 358.911 F2-78 6.978.303 F2-125 89.336
F2-32 L450.039 F2-79 5.309.200 F2-126 367.005
F2-33 L767.194 F2-80 3.081.537 F2-127 1.707.758
F2-34 L587.589 F2-81 3.071.09l F2-128 2.385.919
F2-35 L008.970 F2-82 1.223.914 F2-129 2.786.068
F2-36 2.877.974 F2-83 5.586.662 F2-130 890.661
F2-37 L268211 F2-84 1.880.660 F2-131 1.980.472
F2-38 6.O skilful .973 F2-85 2.620.672 F2-132 3.920.370
F2-39 3.219262 F2-86 5.992.662 F2-133 500.349
F2-40 2.241.103 F2-87 5.636.602 F2-134 2.150.765
F2-41 3.610.730 F2-88 544.490 F2-135 L115.366
F2-42 L641.063 F2-89 4.897.907 F2-136 1.306.934
F2-43 2.382.923 F2-90 2.355.593 F2-137 616.728
F2-44 2.598.428 F2-91 2.506.27l F2-138 949.308
F2-45 692.675 F2-92 841.285 F2-139 1.014.677
Cut the sequencing sequence of an end according to order-checking genes of individuals group enzyme, utilize the full same sex of sequence to generate respectively the information that two parents pile.Obtain 3913 candidate SNP marks of F2 colony, base type and the similar Fig. 7 of corresponding degree of depth situation (data volume is too large, can't show).Utilize respectively current empirical value method commonly used (degree of depth is not less than 6, heterozygosis judging criterion 0.25-0.75) and Bayes statistical method that the candidate SNP mark is identified.To using respectively two kinds of results after the methods evaluation to carry out the statistics of missing data, as Fig. 9, show that both there are differences, but not remarkable.But each loci gene type is analyzed, and finding has 11l, and 005 site is accredited as in bayes method isozygotys, but is accredited as heterozygosis in empirical value; Discovery has 79,401 sites to be accredited as heterozygosis in bayes method, isozygotys but be accredited as in empirical value, claims that above two kinds of sites are uncertain site.Sanger method order-checking is carried out in 100 uncertain sites of random choose, find bayes method in uncertain site correct occupy 77%, be significantly higher than empirical value method (3 examples selecting in Figure 11).
This result shows that bayes method is a kind of reliable statistical means when utilizing the RAD sequencing data to carry out the SNP evaluation.

Claims (5)

1. cut the method for the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme, its step is as follows:
1) after the sequencing result that obtains the RAD high throughput sequencing technologies, the RAD sequencing sequence is filtered to remove underproof sequencing sequence;
2) utilize the full same sex of sequence to generate the information of each individual sequence heap, and calculate each sequence heap order-checking depth information;
3) the interior all sequences heap of body is compared in twos one by one, and heap is carried out cluster to determine intraindividual candidate's heterozygosis SNP site;
4) all sequences heap in Different Individual is compared in twos, heap is carried out cluster to determine interindividual candidate SNP locus;
5) utilize Bayes statistical method that the depth information of range gene type on each candidate SNP locus is analyzed, identify the accuracy of candidate SNP, be used for the work such as follow-up population analysis or experiment.
2. according to claim 1 cutting based on enzyme built the method that check order in the storehouse and the mononucleotide polymorphism site of Bayesian statistics is identified, it is characterized in that: in step 1), the RAD high throughput sequencing technologies is Illumina GA sequencing technologies; Order-checking type acquiescence is single end sequencing, if two end sequencing is only cut end to enzyme and carried out the mononucleotide polymorphism site analysis.
3. according to claim 1ly cut based on enzyme the method that storehouse order-checking and the mononucleotide polymorphism site of Bayesian statistics are identified of building, it is characterized in that: in step 1), to be sequencing quality surpass 50% sequence of whole piece sequence base number lower than the base number of predetermined inferior quality threshold value to described underproof sequencing sequence, and section start does not have enzyme to cut the sequence of characteristic sequence.
4. according to claim 1ly cut based on enzyme the method that storehouse order-checking and the mononucleotide polymorphism site of Bayesian statistics are identified of building, it is characterized in that: in step 3), when each sequence heap is compared mutually, only allow to occur the different sequence heap of two classes, filter out the sequence that surpasses two classes and pile.
5. method of cutting the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme according to claim 1, is characterized in that: in step 4), keep the sequence heap that all comparisons obtain, do not process piling over the sequence of two classes.
CN201310077509.1A 2013-03-11 2013-03-11 The method that storehouse order-checking is identified is built with the mononucleotide polymorphism site of Bayesian statistic based on enzyme action Active CN103114150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310077509.1A CN103114150B (en) 2013-03-11 2013-03-11 The method that storehouse order-checking is identified is built with the mononucleotide polymorphism site of Bayesian statistic based on enzyme action

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310077509.1A CN103114150B (en) 2013-03-11 2013-03-11 The method that storehouse order-checking is identified is built with the mononucleotide polymorphism site of Bayesian statistic based on enzyme action

Publications (2)

Publication Number Publication Date
CN103114150A true CN103114150A (en) 2013-05-22
CN103114150B CN103114150B (en) 2016-07-06

Family

ID=48412567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310077509.1A Active CN103114150B (en) 2013-03-11 2013-03-11 The method that storehouse order-checking is identified is built with the mononucleotide polymorphism site of Bayesian statistic based on enzyme action

Country Status (1)

Country Link
CN (1) CN103114150B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156633A (en) * 2014-08-12 2014-11-19 上海美吉生物医药科技有限公司 Method for perfecting SSR map on basis of RAD map
WO2015062184A1 (en) * 2013-11-01 2015-05-07 Accurascience, Llc Method and apparatus for calling single-nucleotide variations and other variations
CN104946765A (en) * 2015-06-25 2015-09-30 华中农业大学 Somatic mutation site excavation method based on genomic sequencing
CN105525012A (en) * 2016-01-27 2016-04-27 山东省农业科学院生物技术研究中心 Molecular identification method of peanut hybrid
CN107273715A (en) * 2017-05-10 2017-10-20 安吉康尔(深圳)科技有限公司 A kind of detection method and device
CN108277267A (en) * 2016-12-29 2018-07-13 安诺优达基因科技(北京)有限公司 Detect the device of gene mutation and the kit for carrying out parting to the genotype of pregnant woman and fetus
CN111919257A (en) * 2018-07-27 2020-11-10 思勤有限公司 Reducing noise in sequencing data
CN113718342A (en) * 2021-05-06 2021-11-30 安徽农业大学 Construction method of high-density genetic map of recombinant inbred line population
CN114078568A (en) * 2020-09-14 2022-02-22 青岛欧易生物科技有限公司 Metagenome sequencing data processing system and processing method based on IIB type restriction endonuclease characteristics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
岳桂东等: "高通量测序技术在动植物研究领域中的应用", 《中国科学:生命科学》 *
谢为博: "基于表达谱芯片和新一代测序技术的高通量基因分型方法的开发", 《中国博士学位论文全文数据库 基础科学辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089436B2 (en) 2013-11-01 2018-10-02 Accurascience, Llc Method and apparatus for calling single-nucleotide variations and other variations
WO2015062184A1 (en) * 2013-11-01 2015-05-07 Accurascience, Llc Method and apparatus for calling single-nucleotide variations and other variations
CN104156633A (en) * 2014-08-12 2014-11-19 上海美吉生物医药科技有限公司 Method for perfecting SSR map on basis of RAD map
CN104156633B (en) * 2014-08-12 2017-03-01 上海美吉生物医药科技有限公司 The method that SSR collection of illustrative plates is improved based on RAD collection of illustrative plates
CN104946765A (en) * 2015-06-25 2015-09-30 华中农业大学 Somatic mutation site excavation method based on genomic sequencing
CN105525012B (en) * 2016-01-27 2019-04-16 山东省农业科学院生物技术研究中心 A kind of method for identifying molecules of peanut hybridization kind
CN105525012A (en) * 2016-01-27 2016-04-27 山东省农业科学院生物技术研究中心 Molecular identification method of peanut hybrid
CN108277267A (en) * 2016-12-29 2018-07-13 安诺优达基因科技(北京)有限公司 Detect the device of gene mutation and the kit for carrying out parting to the genotype of pregnant woman and fetus
CN108277267B (en) * 2016-12-29 2019-08-13 安诺优达基因科技(北京)有限公司 It detects the device of gene mutation and carries out the kit of parting for the genotype to pregnant woman and fetus
CN107273715A (en) * 2017-05-10 2017-10-20 安吉康尔(深圳)科技有限公司 A kind of detection method and device
CN107273715B (en) * 2017-05-10 2020-03-17 安吉康尔(深圳)科技有限公司 Detection method and device
CN111919257A (en) * 2018-07-27 2020-11-10 思勤有限公司 Reducing noise in sequencing data
CN111919257B (en) * 2018-07-27 2021-05-28 思勤有限公司 Method and system for reducing noise in sequencing data, and implementation and application thereof
CN114078568A (en) * 2020-09-14 2022-02-22 青岛欧易生物科技有限公司 Metagenome sequencing data processing system and processing method based on IIB type restriction endonuclease characteristics
CN114078568B (en) * 2020-09-14 2022-07-05 青岛欧易生物科技有限公司 Metagenome sequencing data processing system and processing method based on IIB type restriction endonuclease characteristics
CN113718342A (en) * 2021-05-06 2021-11-30 安徽农业大学 Construction method of high-density genetic map of recombinant inbred line population

Also Published As

Publication number Publication date
CN103114150B (en) 2016-07-06

Similar Documents

Publication Publication Date Title
CN103114150A (en) Single nucleotide polymorphism site identification method based on digestion library-establishing and sequencing and bayesian statistics
US11242569B2 (en) Methods to determine tumor gene copy number by analysis of cell-free DNA
US20230242977A1 (en) Universal short adapters with variable length non-random unique molecular identifiers
KR20240014606A (en) Methods and processes for non-invasive assessment of genetic variations
CN110021351B (en) Method and system for analyzing base linkage strength and genotyping
WO2021232388A1 (en) Method for determining base type of predetermined site in embryonic cell chromosome, and application thereof
CN104293940A (en) Method for constructing sequencing library and application of sequencing library
CN104264231A (en) Method for constructing sequencing library and application of sequencing library
CN105950707A (en) Method and system for determining nucleic acid sequence
CN104293941A (en) Method for constructing sequencing library and application of sequencing library
JP7362789B2 (en) Systems, computer programs and methods for determining genetic relationships between sperm donors, oocyte donors and their respective conceptuses
CN112226529A (en) SNP molecular marker of wax gourd blight-resistant gene and application
CN102831331B (en) Primer design developing method of length polymorphism sign based on restriction enzyme digestion database-establishing pair-end sequencing
Sell Addressing challenges of ancient DNA sequence data obtained with next generation methods
US20220364080A1 (en) Methods for dna library generation to facilitate the detection and reporting of low frequency variants
JP7446343B2 (en) Systems, computer programs and methods for determining genome ploidy
EP3409788B1 (en) Method and system for nucleic acid sequencing
US20200075124A1 (en) Methods and systems for detecting allelic imbalance in cell-free nucleic acid samples
US20140136121A1 (en) Method for assembling sequenced segments
Hu et al. Processing UMI Datasets at High Accuracy and Efficiency with the Sentieon ctDNA Analysis Pipeline
US20130143746A1 (en) Method for detecting gene region features based on inter-alu polymerase chain reaction
RU2799654C2 (en) Sequence graph-based tool for determining variation in short tandem repeat areas
US20220356513A1 (en) Synthetic polynucleotides and method of use thereof in genetic analysis
Fatima Whole-Genome Sequencing of two Swedish Individuals on PromethION
Fu Analysis of Admixed Animals using Indirect Haplotype Information from Existing Technologies

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Zhang Xianglin

Inventor after: Zhang Yitong

Inventor after: Tao Ye

Inventor after: Zheng Zequn

Inventor after: Hu Qiuping

Inventor before: Tao Ye

Inventor before: Qian Gang

Inventor before: Zheng Zequn

Inventor before: Hu Qiuping

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant