CN103114150A - Single nucleotide polymorphism site identification method based on digestion library-establishing and sequencing and bayesian statistics - Google Patents
Single nucleotide polymorphism site identification method based on digestion library-establishing and sequencing and bayesian statistics Download PDFInfo
- Publication number
- CN103114150A CN103114150A CN2013100775091A CN201310077509A CN103114150A CN 103114150 A CN103114150 A CN 103114150A CN 2013100775091 A CN2013100775091 A CN 2013100775091A CN 201310077509 A CN201310077509 A CN 201310077509A CN 103114150 A CN103114150 A CN 103114150A
- Authority
- CN
- China
- Prior art keywords
- sequence
- sequencing
- enzyme
- heap
- snp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a single nucleotide polymorphism (SNP) site identification method based on digestion library-establishing and sequencing and bayesian statistics. The method is used for processing RAD (restriction site associated deoxyribonucleic acid) sequencing data, searching candidate SNP on an RAD sequencing fragment, and identifying the SNP reliability by employing a bioinformatics analysis method based on bayesian statistics. The method can be used for model and non-model organisms to eliminate the limitation that lots of species are lack of reference sequences and reduce the sequencing cost, and can be used for solving the bottleneck that a reliable statistical method is absent in the process of performing SNP identification by utilizing the RAD data at present, so that the obtained SNP site accuracy is greatly improved.
Description
Technical field
The present invention relates to a kind of method of cutting the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme.Be specially and cut based on enzyme single nucleotide polymorphism (SNP) site of building that the single end in storehouse (single-end) order-checking or two end (pair-end) order-checking obtain and carry out a kind of special Bayesian statistics check, thereby accurately identify the genotypic method of SNP; Can be in the situation that lack with reference to genome sequence, for the SNP check provides the reliable statistics meaning.The method belongs to the bioinformatics technique field.This has great importance for the research of the non-model animals that lacks reference sequences and the study on accuracy of genotype identification.
Background technology
SNP (Single Nucleotide Polymorphisms) single nucleotide polymorphism refers to the variation of single core thuja acid on genome, and its quantity is a lot, and general every 1000 bases just have a SNP, rich polymorphism in human genome.SNP is the ideal mark of carrying out all kinds of molecular biology researches, as building genetic map, gene type, molecular mark, disease forecasting, the fields such as medication guide.
Nowadays, s-generation DNA sequencing technology is a kind of high-throughput sequencing technologies cheaply, and ultimate principle is order-checking while synthesizing.Take the solexa sequence measurement as example, first with physical method, the DNA chain is interrupted at random, then add given joint at fragment two ends, the amplimer sequence is arranged on joint.During order-checking, the complementary strand of the synthetic fragment to be measured of archaeal dna polymerase reads base sequence by detecting the new synthetic entrained fluorescent signal of base, thereby obtains the sequence of fragment to be measured.
S-generation sequencing technologies has been widely used in many fields of bio-science, particularly studies a polymorphism between the species Different Individual.The method that tradition is sought the SNP mark is that individuality is checked order, and obtains short reads, then by short sequence alignment software, these short reads is compared back reference sequences, thereby obtains the individual SNP information that checks order.Common flow process has (general procedure as shown in Figure 1): use BWA software that reads is compared back reference sequences, use SAMtools software processes comparison result to seek SNP site [1,2]; Use SOAP software that reads is compared back reference sequences, use SOAPsnp software processes comparison result to seek SNP site [3,4].Can carry out very easily searching of SNP mark for the species that reference sequences is arranged, but for those non-model animalss, be not have reference sequences basically.And in the situation that there is no reference sequences, the method that tradition is sought the SNP mark exists technical bottleneck.
The RAD sequencing technologies has adopted the new storehouse mode of building (enzyme is cut and built the storehouse), its order-checking detailed process as shown in Figure 2, cut off the specific site of DNA with restriction enzyme, DNA molecular after with physical method, enzyme being cut again interrupts at random, select the DNA molecular of length-specific by agarose gel DNA isolation technique, then add specific amplification joint and sequence measuring joints at select DNA end, thereby structure upward carries out high-flux sequence in the machine library.
Wherein the RAD sequence measurement is method well known in the art, but reference [5,6,7,8] for example.Utilize the RAD sequencing technologies to identify that the SNP site achieves success in a lot of fields, but till the present invention occurred, method used was all generally utilize empirical value to screen and filter.For example in document [6] with two kinds of base depth scalings in the 0.25-0.75(low depth base degree of depth: the site the high depth base degree of depth) judges into the heterozygosis site, and ratio is in the site that becomes to isozygoty of the judgement below 0.1.This method does not have statistical significance, and the impact that is subject to simultaneously other extraneous factors is larger, as the order-checking total amount, identifies that the SNP genotype accuracy that obtains can't guarantee.Document [9] improves authentication method on the basis of empirical value method, use maximum likelihood method to carry out the correction of loci gene type, but its greatest problem is to determine the error rate in statistical method.
Document:
1.Li,H.and?R.Durbin,Fast?and?accurate?short?read?alignment?with?Burrows-Wheeler?transform.Bioinformatics,2009.25(14):p.1754-60.
2.Li,H.,et?al.,The?Sequence?Alignment/Map?format?and?SAMtools.Bioinformatics,2009.25(16):p.2078-9.
3.Li,R.,et?al.,SNP?detection?for?massively?parallel?whole-genome?resequencing.Genome?Res,2009.19(6):p.1124-32.
4.Li,R.,et?al.,SOAP:short?oligonucleotide?alignment?program.Bioinformatics,2008.24(5):p.713-4.
5.Houston,R.D.,et?al.,Characterisation?of?QTL-linked?and?genome-wide?restriction?site-associated?DNA(RAD)markers?in?farmed?Atlantic?salmon.BMC?Genomics,2012.13(1):p.244.
6.Scaglione,D.,et?al.,RAD?tag?sequencing?as?a?source?of?SNP?markers?in?Cynara?cardunculus?L.BMC?Genomics,2012.13(1):p.3.
7.Davey,J.W.,et?al.,Special?features?of?RAD?Sequencing?data:implications?for?genotyping.Mol?Ecol,2012.
8.Dasmahapatra,K.K.,et?al.,Butterfly?genome?reveals?promiscuous?exchange?of?mimicry?adaptations?among?species.Nature,2012.
9.Hohenlohe,P.A.,et?al.,Population?genomics?of?parallel?adaptation?in?threespine?stickleback?using?sequenced?RAD?tags.PLoS?Genet.6(2):p.e1000862.
Summary of the invention
The purpose of this invention is to provide a kind of method of cutting the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme; It is a kind ofly to build by processing to cut based on enzyme the sequencing data that storehouse order-checking (RAD sequencing technologies) obtains, and in individual or seek mononucleotide polymorphism site between individuality, and gives the technical scheme of statistical test.
Purpose of the present invention is achieved through the following technical solutions:
A kind of method of cutting the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme, its step is as follows:
1) after the sequencing result that obtains the RAD high throughput sequencing technologies, the RAD enzyme is cut the end sequencing sequence filter to remove underproof sequencing sequence.
Wherein, the RAD high throughput sequencing technologies can be Illumina GA sequencing technologies, also can be existing other high throughput sequencing technologies.
To be sequencing quality surpass 50% sequence of whole piece sequence base number and the sequence that there is no enzyme Qie Tezheng lower than the base number of predetermined inferior quality threshold value to described underproof sequencing sequence.
2) cut the sequencing sequence of an end according to order-checking genes of individuals group enzyme, utilize the full same sex of sequence to generate the information of each individual heap, and calculate each sequence heap order-checking depth information.For example, the enzyme after each individual filtration is cut the sequencing sequence information of an end as the key of Hash, the value of Hash is pointed to a chained list, is used for depositing the sequence information of the other end, and calculates the order-checking depth information.Available any programming language is realized this process.
3) the interior all sequences heap of body is compared in twos one by one, and heap is carried out cluster to determine intraindividual candidate's heterozygosis SNP site.
For 3) cluster result, only have the cluster result of a heap to show to cut at enzyme not have the heterozygosis site on an end order-checking fragment, only have the cluster result of two heaps to show to cut at enzyme to have the heterozygosis site on an end order-checking fragment.
4) all sequences heap in Different Individual is compared in twos, heap is carried out cluster to determine interindividual candidate SNP locus.
For 4) cluster result, only have the cluster result of a heap to show that there is not the SNP site in two individualities, have the cluster result of two heaps to show and have the SNP site between individuality.
Due to follow-up use Bayes statistical method, so each heap or each cluster result are not carried out depth type filtration herein, be one of advantage of the method, keep as far as possible more SNP site.
5) utilize Bayes statistical method that the depth information of range gene type on each candidate SNP locus is analyzed, identify the accuracy of candidate SNP.Owing to lacking the prior probability that on reference sequences and each site, various bases occur, so can't obtain the actual base that records in the error rate in each site.Therefore, use the exhaustive error rate that might occur of walking method in this Bayes statistical method, then select and make genotype exist the highest situation of probability as the genotype in this SNP site.Concrete formula and calculation procedure are as follows:
For each candidate SNP locus, can there be base possible in 4, i.e. any or multiple in " ATCG ", the base type of the definition frequency of occurrences the highest (degree of depth is maximum) is G1, and the corresponding degree of depth is N1, all the other bases of definition of successively decreasing successively, be genotype Gi(i=1,2,3,4), degree of depth Ni(i=1,2,3,4).Biologically, general species only two kinds of base types can occur on a SNP site, and for example sequencing data shows that A or T(N appear in this SNP site
A〉=N
T), this site must be to isozygoty or two kinds of genotype of heterozygosis so.Therefore this bayes method condition lower probability that only to detect above two kinds of genotype be ε in error rate is:
The posterior probability of range gene type is:
Because there is no sequence and early-stage Study data, error rate ε can't determined value, but document [10,11] report Illumina GA order-checking error rate is in 1% left and right.Set ε from 0.01%-5%, step pitch 0.01% herein.The final posterior probability of using is:
P(N
ij)
Final=max(P(N
ij,ε))i,j∈{1,2,3,4},ε∈[0.01%,5%]。
If P is (N
ij)
FinalBe not less than 0.95, show that the genotype in this SNP site is ij, otherwise be defined as the data (missing data) that can't judge.
Technical scheme of the present invention has adopted the bioinformatic analysis method, process RAD(restriction-site associated DNA) sequencing data, seek the SNP site information on RAD order-checking fragment, utilize bayes method to identify the SNP genotype, the bottleneck that lacks reference sequences to break through non-model animals obtains result accurately when reducing costs.Introduce first Bayes statistical method when identifying the SNP loci gene type, compare with the method for empirical value before, statistical significance significantly improves, and accuracy is corresponding lifting also.
Document:
10.Li,Y.,et?al.,State?of?the?art?de?novo?assembly?of?human?genomes?from?massively?parallel?sequencing?data.Hum?Genomics,2010.4(4):p.271-7.
11.Xie,W.,et?al.,Parent-independent?genotyping?for?constructing?an?ultrahigh-density?linkage?map?based?on?population?sequencing.Proc?Natl?Acad?Sci?U?S?A.107(23):p.10578-83.
Description of drawings
Fig. 1 is the Principle of Process figure that tradition is sought SNP site method;
Fig. 2 is the order-checking detailed process schematic diagram of RAD sequencing technologies; In figure, (A) digestion with restriction enzyme genomic dna, and add the P1 joint, each P1 joint contains different sequence labels; (B), interrupt with the sample mix of different P1 joints together; (C) add top connection P2; (D) amplification enrichment RAD tags;
Fig. 3 is the illustration of RAD order-checking;
The cluster process figure that Fig. 4 makes a living in heaps uses the EcorI restriction enzyme in legend;
Fig. 5 cuts a terminal sequence information schematic diagram for enzyme in heap;
Fig. 6 seeks schematic flow sheet for SNP site in individual in heap and between individuality;
Fig. 7 is the illustration of candidate SNP base type and depth information, and 20 candidate SNP locus base type and depth information in 15 individualities respectively arranged in figure, and " C|9 " represents that this site C measures 9 times, and " C|9:T|3 " represents that this site C measures 9 times, T and measures 3 times.
Fig. 8 is the illustration of the genotype result in SNP site after Bayesian statistics, in figure, " a " and " b " represents respectively two kinds of different homozygous genotypes, " h " expression heterozygous genes type, for example " a " represents AA, and " b " represents CC, and " h " represents AC, x1 represents the posterior probability of isozygotying in " x1:x2:x3:x4 ", error rate values when x2 represents to isozygoty the posterior probability maximum, x3 represents the posterior probability of heterozygosis, the error rate values when x4 represents heterozygosis posterior probability maximum.
Fig. 9 is the deletion condition that utilizes data after empirical value method and Bayes statistical method.
Figure 10 is the statistic result of empirical value method and Bayes statistical method different loci.
Figure 11 is the result that the random choose empirical value method site different from Bayes statistical method utilizes the sanger sequence verification.
Embodiment
Further set forth technical characterstic of the present invention below in conjunction with accompanying drawing and specific embodiment.
As shown in Figure 2, what RAD order-checking was different from conventional high-flux sequence is to need to utilize restriction enzyme complete degestion genome before adding joint, then adds the special joint of RAD, after continue storehouse process and routine to build the storehouse identical.Fig. 3 is the illustration of RAD enzyme simple stage property end order-checking.Shown in Fig. 3 and used restriction enzyme Ecor1, the palindromic sequence of " G^AATTC " on the identification DNA molecular, and between G and A, DNA molecular is cut off, DNA molecular after enzyme is cut is broken into short sequence fragment with physical method, and add top connection at the DNA fragmentation two ends of containing enzyme simple stage property terminal sequence, single end (single-end) order-checking is also carried out in the PCR enrichment, and order-checking is read length and is generally 100nt, also can be 50nt.
Cut the method for the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme, its step is as follows:
1) after the sequencing result that obtains the RAD high throughput sequencing technologies, the two end sequencing sequences of RAD are filtered to remove underproof sequencing sequence.
Wherein, the RAD high throughput sequencing technologies can be Illumina GA sequencing technologies, also can be existing other high throughput sequencing technologies.
To be sequencing quality surpass 50% sequence of whole piece sequence base number and the sequence that there is no enzyme Qie Tezheng lower than the base number of predetermined inferior quality threshold value to described underproof sequencing sequence.
The inferior quality threshold value is decided by concrete sequencing technologies and order-checking environment, for example is set as single base sequencing quality lower than 20; The uncertain base of sequencing result in sequencing sequence (as the N in Illumina GA sequencing result) number surpasses 10% of whole piece sequencing sequence base number and thinks defective sequence; Except the sample joint sequence, compare with the exogenous array that other experiment is introduced, as the various terminal sequence.Think defective sequence if there is exogenous array in sequence; Cut in an end sequencing sequence at enzyme, do not filter out (as restriction enzyme Ecor1, sequencing sequence starts if not " AATTC " filters out whole sequencing sequence) if initial several bases are not enzyme simple stage property terminal sequences.
2) cut the sequencing sequence of an end according to order-checking genes of individuals group enzyme, utilize the full same sex of sequence to generate each individual information of piling.Detailed process as shown in Figure 4.The sequence information that the middle enzyme of heap (Stack) is cut an end can be preserved in the mode of Fig. 5, and in Fig. 5, what first row represented is the sequence information that enzyme is cut an end; What secondary series represented is that enzyme is cut the number of times that a terminal sequence is sequenced, and depth information namely checks order; The 3rd row are ID of this heap, are used for unique definite heap.
3) individual interior heap compares, if the situation in Fig. 6 does not appear in cluster, shows that individual inner these do not have heterozygosis SNP above sequence; If there is the situation in Fig. 6, show that there is heterozygosis SNP in individual inside.
When 4) between individuality, heap carries out comparing in twos, if there is Fig. 6 (a) situation, show to have the SNP that isozygotys between two individualities; If there is Fig. 6 (b) situation, show between two individualities to have heterozygosis SNP.
5) pile up class and relatively finish rear base type and the respective depth information that obtains each candidate SNP locus, representing that as Fig. 7 above-mentioned information gathers.
6) utilize Bayes statistical method that base type and the degree of depth are analyzed, as Fig. 8, obtain final SNP genotype result.
The embodiment data:
Bottle gourd F2 colony builds the genetic map project, comprises 139 F2 plant and two parents, and these 141 individualities are carried out the RAD-PE order-checking.(illustrate: the offspring that male parent and hybridization of female parent generate is F1, and the offspring that the F1 selfing generates is F2; Although use the RAD-PE order-checking, analyze and only use enzyme to cut terminal sequence)
Material source: Zhejiang Academy of Agricultural Science.
Embodiment concrete operations flow process:
The sequencing data that two parent RAD-PE order-checking is obtained, according to the sequencing quality value, the content of N, and whether contain enzyme simple stage property terminal sequence and filter, remove underproof sequencing sequence, the valid data statistics that obtains is as shown in table 1.
Table 1: bottle gourd RAD order-checking valid data statistics
Title | Usage data amount (bp) | Title | Usage data amount (bp) | Title | Usage data amount (bp) |
Male parent | 585,377,540 | F2-46 | 3,800,775 | F2-93 | 1,556,104 |
Maternal | 423,794,746 | F2-47 | 2,522,407 | F2-94 | 1,651,259 |
F2-1 | 3,114,771 | F2-48 | 4,636,152 | F2-95 | 3,213,147 |
F2-2 | 2,302,730 | F2-49 | 3,737,623 | F2-96 | 2,202,354 |
F2-3 | 537,822 | F2-50 | 647,499 | F2-97 | 1,956,440 |
F2-4 | 1,650,925 | F2-51 | 3,678,334 | F2-98 | 1,112,431 |
F2-5 | 2,824,708 | F2-52 | 2,153,996 | F2-99 | 1,086,168 |
F2-6 | 579,177 | F2-53 | 7,029,889 | F2-100 | 1,705,836 |
F2-7 | 2,805,093 | F2-54 | 2,315,687 | F2-101 | 2,311,919 |
F2-8 | 2,234,442 | F2-55 | 4,116,520 | F2-102 | 1,445,671 |
F2-9 | 1,814,510 | F2-56 | 1,554,335 | F2-103 | 5,292,536 |
F2-10 | 2,063,581 | F2-57 | 1,912,949 | F2-104 | 371,736 |
F2-11 | 437,393 | F2-58 | 4,513,981 | F2-105 | 5,528,190 |
F2-12 | 1,114,627 | F2-59 | 4,660,158 | F2-106 | 2,286,908 |
F2-13 | 292,168 | F2-60 | 2,963,600 | F2-107 | 2,977,378 |
[0073]?
F2-14 | L981.379 | F2-61 | 1.912.346 | F2-108 | 1.113.885 |
F2-15 | L710.808 | F2-62 | 2.198.112 | F2-109 | 2.358.577 |
F2-16 | L666.317 | F2-63 | 2.149.228 | F2-110 | 2.014.988 |
F2-17 | 3.837.185 | F2-64 | 2.907.400 | F2-111 | 5.021.837 |
F2-18 | 2.705.794 | F2-65 | 2565399 | F2-112 | L687.183 |
F2-19 | L641.718 | F2-66 | 1.802.757 | F2-113 | 1.454.774 |
F2-20 | 4.181.837 | F2-67 | 6.136.789 | F2-114 | L187.993 |
F2-21 | 2.167.926 | F2-68 | 5.106.060 | F2-115 | 917.204 |
F2-22 | 8.967 | F2-69 | 5.492.357 | F2-116 | 673.176 |
F2-23 | L936.761 | F2-70 | 4.925.717 | F2-117 | 903.357 |
F2-24 | 4.907.028 | F2-71 | 2.016.103 | F2-118 | 1.252.469 |
F2-25 | 2.641.269 | F2-72 | 4.495.767 | F2-119 | L066.660 |
F2-26 | L344.809 | F2-73 | 957.643 | F2-120 | 83.426 |
F2-27 | 2.184.764 | F2-74 | 3.193.347 | F2-121 | 624.005 |
F2-28 | 2.312.351 | F2-75 | 30.335 | F2-122 | 4.246.910 |
F2-29 | L318.322 | F2-76 | 2.067.906 | F2-123 | 824.013 |
F2-30 | L830.247 | F2-77 | 200.856 | F2-124 | 3.322.863 |
F2-31 | 358.911 | F2-78 | 6.978.303 | F2-125 | 89.336 |
F2-32 | L450.039 | F2-79 | 5.309.200 | F2-126 | 367.005 |
F2-33 | L767.194 | F2-80 | 3.081.537 | F2-127 | 1.707.758 |
F2-34 | L587.589 | F2-81 | 3.071.09l | F2-128 | 2.385.919 |
F2-35 | L008.970 | F2-82 | 1.223.914 | F2-129 | 2.786.068 |
F2-36 | 2.877.974 | F2-83 | 5.586.662 | F2-130 | 890.661 |
F2-37 | L268211 | F2-84 | 1.880.660 | F2-131 | 1.980.472 |
F2-38 | 6.O skilful .973 | F2-85 | 2.620.672 | F2-132 | 3.920.370 |
F2-39 | 3.219262 | F2-86 | 5.992.662 | F2-133 | 500.349 |
F2-40 | 2.241.103 | F2-87 | 5.636.602 | F2-134 | 2.150.765 |
F2-41 | 3.610.730 | F2-88 | 544.490 | F2-135 | L115.366 |
F2-42 | L641.063 | F2-89 | 4.897.907 | F2-136 | 1.306.934 |
F2-43 | 2.382.923 | F2-90 | 2.355.593 | F2-137 | 616.728 |
F2-44 | 2.598.428 | F2-91 | 2.506.27l | F2-138 | 949.308 |
F2-45 | 692.675 | F2-92 | 841.285 | F2-139 | 1.014.677 |
Cut the sequencing sequence of an end according to order-checking genes of individuals group enzyme, utilize the full same sex of sequence to generate respectively the information that two parents pile.Obtain 3913 candidate SNP marks of F2 colony, base type and the similar Fig. 7 of corresponding degree of depth situation (data volume is too large, can't show).Utilize respectively current empirical value method commonly used (degree of depth is not less than 6, heterozygosis judging criterion 0.25-0.75) and Bayes statistical method that the candidate SNP mark is identified.To using respectively two kinds of results after the methods evaluation to carry out the statistics of missing data, as Fig. 9, show that both there are differences, but not remarkable.But each loci gene type is analyzed, and finding has 11l, and 005 site is accredited as in bayes method isozygotys, but is accredited as heterozygosis in empirical value; Discovery has 79,401 sites to be accredited as heterozygosis in bayes method, isozygotys but be accredited as in empirical value, claims that above two kinds of sites are uncertain site.Sanger method order-checking is carried out in 100 uncertain sites of random choose, find bayes method in uncertain site correct occupy 77%, be significantly higher than empirical value method (3 examples selecting in Figure 11).
This result shows that bayes method is a kind of reliable statistical means when utilizing the RAD sequencing data to carry out the SNP evaluation.
Claims (5)
1. cut the method for the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme, its step is as follows:
1) after the sequencing result that obtains the RAD high throughput sequencing technologies, the RAD sequencing sequence is filtered to remove underproof sequencing sequence;
2) utilize the full same sex of sequence to generate the information of each individual sequence heap, and calculate each sequence heap order-checking depth information;
3) the interior all sequences heap of body is compared in twos one by one, and heap is carried out cluster to determine intraindividual candidate's heterozygosis SNP site;
4) all sequences heap in Different Individual is compared in twos, heap is carried out cluster to determine interindividual candidate SNP locus;
5) utilize Bayes statistical method that the depth information of range gene type on each candidate SNP locus is analyzed, identify the accuracy of candidate SNP, be used for the work such as follow-up population analysis or experiment.
2. according to claim 1 cutting based on enzyme built the method that check order in the storehouse and the mononucleotide polymorphism site of Bayesian statistics is identified, it is characterized in that: in step 1), the RAD high throughput sequencing technologies is Illumina GA sequencing technologies; Order-checking type acquiescence is single end sequencing, if two end sequencing is only cut end to enzyme and carried out the mononucleotide polymorphism site analysis.
3. according to claim 1ly cut based on enzyme the method that storehouse order-checking and the mononucleotide polymorphism site of Bayesian statistics are identified of building, it is characterized in that: in step 1), to be sequencing quality surpass 50% sequence of whole piece sequence base number lower than the base number of predetermined inferior quality threshold value to described underproof sequencing sequence, and section start does not have enzyme to cut the sequence of characteristic sequence.
4. according to claim 1ly cut based on enzyme the method that storehouse order-checking and the mononucleotide polymorphism site of Bayesian statistics are identified of building, it is characterized in that: in step 3), when each sequence heap is compared mutually, only allow to occur the different sequence heap of two classes, filter out the sequence that surpasses two classes and pile.
5. method of cutting the mononucleotide polymorphism site evaluation of building storehouse order-checking and Bayesian statistics based on enzyme according to claim 1, is characterized in that: in step 4), keep the sequence heap that all comparisons obtain, do not process piling over the sequence of two classes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310077509.1A CN103114150B (en) | 2013-03-11 | 2013-03-11 | The method that storehouse order-checking is identified is built with the mononucleotide polymorphism site of Bayesian statistic based on enzyme action |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310077509.1A CN103114150B (en) | 2013-03-11 | 2013-03-11 | The method that storehouse order-checking is identified is built with the mononucleotide polymorphism site of Bayesian statistic based on enzyme action |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103114150A true CN103114150A (en) | 2013-05-22 |
CN103114150B CN103114150B (en) | 2016-07-06 |
Family
ID=48412567
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310077509.1A Active CN103114150B (en) | 2013-03-11 | 2013-03-11 | The method that storehouse order-checking is identified is built with the mononucleotide polymorphism site of Bayesian statistic based on enzyme action |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103114150B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156633A (en) * | 2014-08-12 | 2014-11-19 | 上海美吉生物医药科技有限公司 | Method for perfecting SSR map on basis of RAD map |
WO2015062184A1 (en) * | 2013-11-01 | 2015-05-07 | Accurascience, Llc | Method and apparatus for calling single-nucleotide variations and other variations |
CN104946765A (en) * | 2015-06-25 | 2015-09-30 | 华中农业大学 | Somatic mutation site excavation method based on genomic sequencing |
CN105525012A (en) * | 2016-01-27 | 2016-04-27 | 山东省农业科学院生物技术研究中心 | Molecular identification method of peanut hybrid |
CN107273715A (en) * | 2017-05-10 | 2017-10-20 | 安吉康尔(深圳)科技有限公司 | A kind of detection method and device |
CN108277267A (en) * | 2016-12-29 | 2018-07-13 | 安诺优达基因科技(北京)有限公司 | Detect the device of gene mutation and the kit for carrying out parting to the genotype of pregnant woman and fetus |
CN111919257A (en) * | 2018-07-27 | 2020-11-10 | 思勤有限公司 | Reducing noise in sequencing data |
CN113718342A (en) * | 2021-05-06 | 2021-11-30 | 安徽农业大学 | Construction method of high-density genetic map of recombinant inbred line population |
CN114078568A (en) * | 2020-09-14 | 2022-02-22 | 青岛欧易生物科技有限公司 | Metagenome sequencing data processing system and processing method based on IIB type restriction endonuclease characteristics |
-
2013
- 2013-03-11 CN CN201310077509.1A patent/CN103114150B/en active Active
Non-Patent Citations (2)
Title |
---|
岳桂东等: "高通量测序技术在动植物研究领域中的应用", 《中国科学:生命科学》 * |
谢为博: "基于表达谱芯片和新一代测序技术的高通量基因分型方法的开发", 《中国博士学位论文全文数据库 基础科学辑》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10089436B2 (en) | 2013-11-01 | 2018-10-02 | Accurascience, Llc | Method and apparatus for calling single-nucleotide variations and other variations |
WO2015062184A1 (en) * | 2013-11-01 | 2015-05-07 | Accurascience, Llc | Method and apparatus for calling single-nucleotide variations and other variations |
CN104156633A (en) * | 2014-08-12 | 2014-11-19 | 上海美吉生物医药科技有限公司 | Method for perfecting SSR map on basis of RAD map |
CN104156633B (en) * | 2014-08-12 | 2017-03-01 | 上海美吉生物医药科技有限公司 | The method that SSR collection of illustrative plates is improved based on RAD collection of illustrative plates |
CN104946765A (en) * | 2015-06-25 | 2015-09-30 | 华中农业大学 | Somatic mutation site excavation method based on genomic sequencing |
CN105525012B (en) * | 2016-01-27 | 2019-04-16 | 山东省农业科学院生物技术研究中心 | A kind of method for identifying molecules of peanut hybridization kind |
CN105525012A (en) * | 2016-01-27 | 2016-04-27 | 山东省农业科学院生物技术研究中心 | Molecular identification method of peanut hybrid |
CN108277267A (en) * | 2016-12-29 | 2018-07-13 | 安诺优达基因科技(北京)有限公司 | Detect the device of gene mutation and the kit for carrying out parting to the genotype of pregnant woman and fetus |
CN108277267B (en) * | 2016-12-29 | 2019-08-13 | 安诺优达基因科技(北京)有限公司 | It detects the device of gene mutation and carries out the kit of parting for the genotype to pregnant woman and fetus |
CN107273715A (en) * | 2017-05-10 | 2017-10-20 | 安吉康尔(深圳)科技有限公司 | A kind of detection method and device |
CN107273715B (en) * | 2017-05-10 | 2020-03-17 | 安吉康尔(深圳)科技有限公司 | Detection method and device |
CN111919257A (en) * | 2018-07-27 | 2020-11-10 | 思勤有限公司 | Reducing noise in sequencing data |
CN111919257B (en) * | 2018-07-27 | 2021-05-28 | 思勤有限公司 | Method and system for reducing noise in sequencing data, and implementation and application thereof |
CN114078568A (en) * | 2020-09-14 | 2022-02-22 | 青岛欧易生物科技有限公司 | Metagenome sequencing data processing system and processing method based on IIB type restriction endonuclease characteristics |
CN114078568B (en) * | 2020-09-14 | 2022-07-05 | 青岛欧易生物科技有限公司 | Metagenome sequencing data processing system and processing method based on IIB type restriction endonuclease characteristics |
CN113718342A (en) * | 2021-05-06 | 2021-11-30 | 安徽农业大学 | Construction method of high-density genetic map of recombinant inbred line population |
Also Published As
Publication number | Publication date |
---|---|
CN103114150B (en) | 2016-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103114150A (en) | Single nucleotide polymorphism site identification method based on digestion library-establishing and sequencing and bayesian statistics | |
US11242569B2 (en) | Methods to determine tumor gene copy number by analysis of cell-free DNA | |
US20230242977A1 (en) | Universal short adapters with variable length non-random unique molecular identifiers | |
KR20240014606A (en) | Methods and processes for non-invasive assessment of genetic variations | |
CN110021351B (en) | Method and system for analyzing base linkage strength and genotyping | |
WO2021232388A1 (en) | Method for determining base type of predetermined site in embryonic cell chromosome, and application thereof | |
CN104293940A (en) | Method for constructing sequencing library and application of sequencing library | |
CN104264231A (en) | Method for constructing sequencing library and application of sequencing library | |
CN105950707A (en) | Method and system for determining nucleic acid sequence | |
CN104293941A (en) | Method for constructing sequencing library and application of sequencing library | |
JP7362789B2 (en) | Systems, computer programs and methods for determining genetic relationships between sperm donors, oocyte donors and their respective conceptuses | |
CN112226529A (en) | SNP molecular marker of wax gourd blight-resistant gene and application | |
CN102831331B (en) | Primer design developing method of length polymorphism sign based on restriction enzyme digestion database-establishing pair-end sequencing | |
Sell | Addressing challenges of ancient DNA sequence data obtained with next generation methods | |
US20220364080A1 (en) | Methods for dna library generation to facilitate the detection and reporting of low frequency variants | |
JP7446343B2 (en) | Systems, computer programs and methods for determining genome ploidy | |
EP3409788B1 (en) | Method and system for nucleic acid sequencing | |
US20200075124A1 (en) | Methods and systems for detecting allelic imbalance in cell-free nucleic acid samples | |
US20140136121A1 (en) | Method for assembling sequenced segments | |
Hu et al. | Processing UMI Datasets at High Accuracy and Efficiency with the Sentieon ctDNA Analysis Pipeline | |
US20130143746A1 (en) | Method for detecting gene region features based on inter-alu polymerase chain reaction | |
RU2799654C2 (en) | Sequence graph-based tool for determining variation in short tandem repeat areas | |
US20220356513A1 (en) | Synthetic polynucleotides and method of use thereof in genetic analysis | |
Fatima | Whole-Genome Sequencing of two Swedish Individuals on PromethION | |
Fu | Analysis of Admixed Animals using Indirect Haplotype Information from Existing Technologies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Zhang Xianglin Inventor after: Zhang Yitong Inventor after: Tao Ye Inventor after: Zheng Zequn Inventor after: Hu Qiuping Inventor before: Tao Ye Inventor before: Qian Gang Inventor before: Zheng Zequn Inventor before: Hu Qiuping |
|
COR | Change of bibliographic data | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |