CN114937475A - Automatic evaluation method for error correction result of PacBio sequencing data - Google Patents
Automatic evaluation method for error correction result of PacBio sequencing data Download PDFInfo
- Publication number
- CN114937475A CN114937475A CN202210380137.9A CN202210380137A CN114937475A CN 114937475 A CN114937475 A CN 114937475A CN 202210380137 A CN202210380137 A CN 202210380137A CN 114937475 A CN114937475 A CN 114937475A
- Authority
- CN
- China
- Prior art keywords
- error correction
- sequence
- corrected
- sequences
- contigs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012937 correction Methods 0.000 title claims abstract description 63
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 38
- 238000011156 evaluation Methods 0.000 title claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000007619 statistical method Methods 0.000 claims abstract description 10
- 238000003908 quality control method Methods 0.000 claims abstract description 8
- 230000035945 sensitivity Effects 0.000 claims abstract description 7
- 238000010586 diagram Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 3
- 238000009966 trimming Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 2
- 230000037430 deletion Effects 0.000 claims description 2
- 238000003780 insertion Methods 0.000 claims description 2
- 230000037431 insertion Effects 0.000 claims description 2
- 230000000052 comparative effect Effects 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 238000007671 third-generation sequencing Methods 0.000 description 3
- 238000001712 DNA sequencing Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000011511 automated evaluation Methods 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Data Mining & Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an automatic evaluation method of an error correction result of PacBio sequencing data, which comprises the steps of carrying out quality control on original PacBio sequencing data to obtain a sequencing sequence which accords with a set threshold range; correcting the clean reads after the quality control by using an error correction method to be evaluated to obtain a sequence marked as corrected reads, and counting memory resources and time consumption required by error correction; comparing and analyzing clean reads and corrected reads before and after error correction to obtain error correction output rate TH and average length of sequences after error correction; comparing the corrected reads with the corresponding reference genome to obtain an alignment sequence MSA, and performing statistical analysis to obtain the correction sensitivity and the correction rate; assembling the corrected reads to obtain contigs; and (3) comparing the contigs with the corresponding reference genome to obtain the statistic analysis of the comparative contigs MSA, and counting the number of the contigs, the genome coverage rate and the NGA 50.
Description
Technical Field
The invention relates to the technical field of biological information, in particular to an automatic evaluation method for a PacBio sequencing data error correction result.
Background
In recent years, the third generation sequencing technology, represented by PacBio, has become one of the most widely used DNA sequencing technologies. The third generation sequencing technology enables the read length to reach more than 10kb on the premise of ensuring high flux by a method for displaying the nucleotide sequence under single molecule resolution in real time, and solves a plurality of problems caused by reading length (usually only 150 to 600bp) in second generation sequencing, such as difficulty in identifying large structural variation on chromosome, observation gene fusion, difficulty in detecting variable shearing on RNA level and the like.
Although the PacBio sequencing technique has made a major breakthrough in read length, it introduces a very high error rate, typically around 15%. Therefore, reducing the sequence error rate prior to downstream applications is an important step in analyzing PacBio DNA sequencing data. There are methods for improving the accuracy of sequences by improving sequencing technologies, such as the Circular Consensus Sequencing (CCS) method proposed by PacBio corporation. However, such methods can significantly reduce sequence read length and also result in increased sequencing costs. In contrast, computational methods can improve sequence accuracy at a lower cost sequencing protocol, and are an efficient and feasible solution.
There have been a number of computational approaches to solving the error correction problem of PacBio sequencing data. These methods can be classified into two categories, i.e., a hybrid correction strategy method and a self-correction strategy method, depending on the strategy employed. These methods all performed differently for PacBio sequencing data of different scale genomes at different sequencing depths. In actual use, researchers usually only need to select an error correction method with optimal performance according to the research interests and data of the researchers. Therefore, a reasonable and effective automatic evaluation method for the sequencing data error correction result of PacBio is significant.
At present, less automatic evaluation tools exist in the field of PacBio sequencing data error correction, and some defects exist, such as evaluation is carried out only by using small-scale genome sequencing data or simulation data, and the performance of error correction tools on large-scale genomes is ignored; only data with a single depth are used, and the influence of depth change in the actual sequencing process on the performance is ignored; the evaluation index is relatively one-sided, and evaluation is not performed from multiple aspects such as error correction performance, resource requirements and downstream application performance.
Disclosure of Invention
The invention aims to provide an automatic evaluation method for the error correction result of PacBio sequencing data, aiming at the defects in the background art.
The technical scheme for realizing the purpose of the invention is as follows:
an automatic evaluation method for error correction results of PacBio sequencing data comprises the following steps:
1) performing quality control on the original PacBio sequencing data to obtain a sequencing sequence which accords with a set threshold range; specifically, factors of base mass fraction, sequence mass fraction, GC content and sequence repetition level of original PacBio sequencing data are checked, a threshold value is set according to various factors, the original sequencing data are screened, and after sequences lower than and higher than the threshold value are removed, the obtained sequences are marked as clean reads;
2) correcting the clean reads after quality control by using an error correction method to be evaluated to obtain sequences marked as corrected reads, and counting memory resources and time consumption required by error correction while correcting the errors;
3) comparing the sequence clean reads before error correction with the sequence corrected reads after error correction, and performing statistical analysis to obtain an error correction output rate TH and an average length of the sequence after error correction; the method specifically comprises the following steps:
3-1) comparing the data quantity of the corrected sequences and the clear sequences before error correction, and calculating the error correction output rate TH in the error correction process;
3-2) calculating the average length of corrected reads of the sequence after error correction, and visualizing the length distribution;
4) comparing the corrected sequences with the corresponding reference genomes to obtain a comparison sequence MSA;
5) performing statistical analysis on the compared sequence MSA to obtain the sensitivity and accuracy of error correction, and specifically comprising the following steps:
5-1) counting the successfully corrected base number TP and the error corrected base number FN of the corrected sequences;
5-2) calculating the sensitivity of the error correction process by using TH, TP and FN;
5-3) counting the total number of the inserted, deleted and replaced three wrong bases contained in the corrected sequence corrected reads;
5-4) calculating the correct rate of corrected sequences of the three wrong base numbers obtained in the step 5-3);
5-5) calculating the genome coverage rate of the corrected sequences;
6) assembling the corrected sequences corrected reads to obtain contigs sequences, which specifically comprises the following steps:
6-1) carrying out pairwise comparison on the corrected sequences, wherein each reads only retains two comparison results with optimal quality and length, and calculating the overall coverage rate estimation;
6-2) comparing and further screening the results according to the coverage rate information, and discarding the comparison with the coverage rate smaller than a threshold value;
6-3) generating a character string diagram by discarding the comparison result with low coverage rate, trimming the tail end, removing the overlapped part with the support quantity smaller than the threshold value, and eliminating short branches;
6-4) generating contigs sequences by using the character string diagram;
7) comparing the contigs sequence obtained by assembly with a reference genome corresponding to the contigs sequence to obtain a comparison sequence contigs MSA;
8) carrying out statistical analysis on the contigs MSA obtained in the step 7), specifically counting the number of contigs sequences, calculating the genome coverage rate of the contigs sequences and calculating the NGA50 of the contigs sequences.
The automatic evaluation method for the error correction result of PacBio sequencing data is suitable for real data including large genomes and various sequencing depths, and overcomes the defect that only simulation data or real data of small genomes can be used in the prior art. Meanwhile, the method takes the practical use flow of the third generation sequencing data as a starting point, selects 8 indexes, and evaluates the error correction performance, the resource requirement, the downstream application performance and other aspects, and is more comprehensive compared with the prior art.
Researchers can systematically consider computing resources, data magnitude and experimental purposes according to research interests, determine reasonable evaluation index weight, and select a proper correction tool by using the automatic evaluation method provided by the invention. Developers in the field can verify the performance of the development method from multiple dimensions according to the automatic evaluation method, so as to guide the development method to improve the method or prove the effectiveness of the method.
Drawings
Fig. 1 is a flowchart of an automated evaluation method for error correction results of PacBio sequencing data.
Detailed Description
The invention will be further described with reference to the following drawings and examples, which are not intended to limit the invention.
The embodiment is as follows:
an automatic evaluation method for error correction results of PacBio sequencing data comprises the following steps:
1) performing quality control on the original PacBio sequencing data to obtain a sequencing sequence which accords with a set threshold range; specifically, factors such as base mass fraction, sequence mass fraction, GC content and sequence repetition level of original PacBio sequencing data are checked, a threshold value is set according to the factors to screen the original sequencing data, and after sequences lower than and higher than the threshold value are removed, the obtained sequence is marked as clean reads;
wherein raw sequencing data can be statistically reviewed and filtered using FastQC software;
2) calling an error correction tool which is expected to be evaluated, correcting the sequence clean reads after quality control to obtain a sequence marked as corrected reads, monitoring the memory occupation peak value of a program in the error correction process, and counting memory resources and time consumption required by the complete error correction process;
3) comparing the sequence clean reads before error correction with the sequence corrected reads after error correction, and performing statistical analysis to obtain an error correction output rate TH and an average length of the sequence after error correction; the method specifically comprises the following steps:
3-1) dividing the base number of the corrected sequences by the base number of the clean sequences before error correction to obtain the error correction output rate TH in the error correction process;
3-2) calculating the average length of the corrected sequences, and visualizing the length distribution by using a violin graph;
4) comparing the corrected sequences with the corresponding reference genomes to obtain a comparison sequence MSA; specifically, the corrected reads are compared to the corresponding reference genome by using comparison software Minimap2 to obtain a comparison file of the corrected reads, namely a sequence MSA;
5) performing statistical analysis on the compared sequence MSA to obtain the sensitivity and accuracy of error correction, and specifically comprising the following steps:
5-1) counting the number TP of successfully corrected bases and the number FN of erroneous but uncorrected bases of corrected sequences;
5-2) calculating the sensitivity of the error correction process according to the formula (TH × TP/(TP + FN) using TH, TP, and FN;
5-3) counting the total number of the basic groups of the three errors of insertion, deletion and replacement contained in the corrected sequence corrected reads;
5-4) calculating the correct rate of corrected sequences of corrected reads according to the total number of the three wrong bases obtained in the step 5-3), wherein the specific error rate is obtained by dividing the total number of the three wrong bases by the total number of the corrected reads;
5-5) calculating the genome coverage rate of the corrected sequences; specifically, the ratio of the reference genome covered by the corrected reads in the alignment file is calculated;
6) assembling the corrected sequences to obtain contigs sequences, which specifically comprises the following steps:
6-1) carrying out pairwise comparison on the corrected sequences, wherein each reads only retains two comparison results with optimal quality and length, and calculating the overall coverage rate estimation;
6-2) according to the coverage rate information, further screening the results by comparison, and discarding the comparison with the coverage rate less than 50%;
6-3) generating a character string diagram by discarding the comparison result with low coverage, trimming the tail end, removing the overlapped part with the support quantity less than 3, and eliminating short branches;
6-4) generating contigs sequences by using the character string diagram;
the corrected reads may be assembled specifically using minism software.
7) Comparing the contigs sequence obtained by assembly with a reference genome corresponding to the contigs sequence to obtain a comparison sequence contigs MSA; specifically, the assembled contigs can be aligned to the corresponding reference genome by using alignment software Minimap2 to obtain an alignment file of the contigs.
8) Carrying out statistical analysis on the contigs MSA obtained in the step 7), specifically counting the number of contigs sequences, calculating the genome coverage rate of the contigs sequences and calculating the NGA50 of the contigs sequences.
Claims (1)
1. An automatic evaluation method for error correction results of PacBio sequencing data is characterized by comprising the following steps:
1) performing quality control on the original PacBio sequencing data to obtain a sequencing sequence which accords with a set threshold range; specifically, factors of base mass fraction, sequence mass fraction, GC content and sequence repetition level of original PacBio sequencing data are checked, a threshold value is set according to the factors to screen the original sequencing data, and after sequences lower than and higher than the threshold value are removed, the obtained sequence is marked as clean reads;
2) correcting the clean reads after quality control by using an error correction method to be evaluated to obtain sequences marked as corrected reads, and counting memory resources and time consumption required by error correction while correcting the errors;
3) comparing the sequence clean reads before error correction with the sequence corrected reads after error correction, and performing statistical analysis to obtain an error correction output rate TH and an average length of the sequence after error correction; the method specifically comprises the following steps:
3-1) comparing the data quantity of the corrected sequences and the clear sequences before error correction, and calculating the error correction output rate TH in the error correction process;
3-2) calculating the average length of corrected reads of the sequence after error correction, and visualizing the length distribution;
4) comparing the corrected sequences with the corresponding reference genomes to obtain a comparison sequence MSA;
5) performing statistical analysis on the compared sequence MSA to obtain the sensitivity and accuracy of error correction, and specifically comprising the following steps:
5-1) counting the successfully corrected base number TP and the error corrected base number FN of the corrected sequences;
5-2) calculating the sensitivity of the error correction process by using TH, TP and FN;
5-3) counting the total number of the basic groups of the three errors of insertion, deletion and replacement contained in the corrected sequence corrected reads;
5-4) calculating the correct rate of the corrected sequences of the three wrong bases obtained in the step 5-3);
5-5) calculating the genome coverage rate of the corrected sequences;
6) assembling the corrected sequences to obtain contigs sequences, which specifically comprises the following steps:
6-1) carrying out pairwise comparison on the corrected sequences, wherein each reads only retains two comparison results with optimal quality and length, and calculating the overall coverage rate estimation;
6-2) comparing and further screening the results according to the coverage rate information, and discarding the comparison with the coverage rate smaller than a threshold value;
6-3) generating a character string diagram by discarding the comparison result with low coverage rate, trimming the tail end, removing the overlapped part with the support quantity smaller than the threshold value, and eliminating short branches;
6-4) generating contigs sequences by using the character string diagram;
7) comparing the contigs sequence obtained by assembly with a reference genome corresponding to the contigs sequence to obtain a comparison sequence contigs MSA;
8) and (4) carrying out statistical analysis on the contigs MSA obtained in the step 7), specifically counting the number of contigs sequences, calculating the genome coverage rate of the contigs sequences and calculating the NGA50 of the contigs sequences.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210380137.9A CN114937475A (en) | 2022-04-12 | 2022-04-12 | Automatic evaluation method for error correction result of PacBio sequencing data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210380137.9A CN114937475A (en) | 2022-04-12 | 2022-04-12 | Automatic evaluation method for error correction result of PacBio sequencing data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114937475A true CN114937475A (en) | 2022-08-23 |
Family
ID=82861884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210380137.9A Pending CN114937475A (en) | 2022-04-12 | 2022-04-12 | Automatic evaluation method for error correction result of PacBio sequencing data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114937475A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110295784A1 (en) * | 2008-12-12 | 2011-12-01 | Jun Wang | Error correcting method of test sequence, corresponding system and gene assembly equipment |
WO2013097149A1 (en) * | 2011-12-29 | 2013-07-04 | 深圳华大基因科技服务有限公司 | Method and device for estimating repeating sequence content of genome |
WO2013097061A1 (en) * | 2011-12-31 | 2013-07-04 | 深圳华大基因研究院 | Bs and rrbs sequencing-based bioinformatics analysis method and device |
US20140188397A1 (en) * | 2011-05-17 | 2014-07-03 | Bgi Tech Solutions Co., Ltd. | Methods of acquiring genome size and error |
CN107563151A (en) * | 2017-09-18 | 2018-01-09 | 杭州和壹基因科技有限公司 | A kind of PacBio sequencing datas assemble the error correction method of obtained genome sequence |
CN108573127A (en) * | 2017-03-14 | 2018-09-25 | 深圳华大基因科技服务有限公司 | Processing method and its application of initial data is sequenced in a kind of nucleic acid third generation |
CN108629156A (en) * | 2017-03-21 | 2018-10-09 | 深圳华大基因科技服务有限公司 | The method, apparatus and computer readable storage medium of three generations's sequencing data error correction |
CN112133368A (en) * | 2020-10-13 | 2020-12-25 | 南开大学 | Automated analysis method of metagenome sequencing data based on third-generation sequencing technology |
CN112687339A (en) * | 2021-01-21 | 2021-04-20 | 深圳吉因加医学检验实验室 | Method and device for counting sequence errors in plasma DNA fragment sequencing data |
CN113496760A (en) * | 2020-04-01 | 2021-10-12 | 深圳华大基因科技服务有限公司 | Polyploid genome assembling method and device based on third-generation sequencing |
-
2022
- 2022-04-12 CN CN202210380137.9A patent/CN114937475A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110295784A1 (en) * | 2008-12-12 | 2011-12-01 | Jun Wang | Error correcting method of test sequence, corresponding system and gene assembly equipment |
US20140188397A1 (en) * | 2011-05-17 | 2014-07-03 | Bgi Tech Solutions Co., Ltd. | Methods of acquiring genome size and error |
WO2013097149A1 (en) * | 2011-12-29 | 2013-07-04 | 深圳华大基因科技服务有限公司 | Method and device for estimating repeating sequence content of genome |
WO2013097061A1 (en) * | 2011-12-31 | 2013-07-04 | 深圳华大基因研究院 | Bs and rrbs sequencing-based bioinformatics analysis method and device |
CN108573127A (en) * | 2017-03-14 | 2018-09-25 | 深圳华大基因科技服务有限公司 | Processing method and its application of initial data is sequenced in a kind of nucleic acid third generation |
CN108629156A (en) * | 2017-03-21 | 2018-10-09 | 深圳华大基因科技服务有限公司 | The method, apparatus and computer readable storage medium of three generations's sequencing data error correction |
CN107563151A (en) * | 2017-09-18 | 2018-01-09 | 杭州和壹基因科技有限公司 | A kind of PacBio sequencing datas assemble the error correction method of obtained genome sequence |
CN113496760A (en) * | 2020-04-01 | 2021-10-12 | 深圳华大基因科技服务有限公司 | Polyploid genome assembling method and device based on third-generation sequencing |
CN112133368A (en) * | 2020-10-13 | 2020-12-25 | 南开大学 | Automated analysis method of metagenome sequencing data based on third-generation sequencing technology |
CN112687339A (en) * | 2021-01-21 | 2021-04-20 | 深圳吉因加医学检验实验室 | Method and device for counting sequence errors in plasma DNA fragment sequencing data |
Non-Patent Citations (2)
Title |
---|
周卫星;石海鹤;: "高通量测序中序列拼接算法的研究进展", 计算机科学, no. 05, 15 May 2019 (2019-05-15) * |
马东娜;张兴坦;魏柳锋;李仪莹;钟伟民;赵茜;尤民生;: "基因组二代测序数据与三代测序数据的混合校正和组装", 基因组学与应用生物学, no. 04, 25 April 2018 (2018-04-25) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Steinegger et al. | Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold | |
Venturini et al. | Leveraging multiple transcriptome assembly methods for improved gene structure annotation | |
Novák et al. | Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2 | |
CN102206704B (en) | Method and device for assembling genome sequence | |
CN112133368A (en) | Automated analysis method of metagenome sequencing data based on third-generation sequencing technology | |
Springer et al. | ILS-aware analysis of low-homoplasy retroelement insertions: inference of species trees and introgression using quartets | |
Delhomme et al. | Guidelines for RNA-Seq data analysis | |
CN102521528A (en) | Method for screening gene sequence data | |
Rivera-Rivera et al. | LS³: A method for improving phylogenomic inferences when evolutionary rates are heterogeneous among taxa | |
CN118186103A (en) | Lateolabrax japonicus 100k liquid phase chip and application thereof | |
CN112466416B (en) | Material data cleaning method combining nickel-based alloy priori knowledge | |
CN117612605A (en) | Virus whole genome sequence assembly analysis method based on high-throughput sequencing | |
CN114937475A (en) | Automatic evaluation method for error correction result of PacBio sequencing data | |
CN112397148A (en) | Sequence comparison method, sequence correction method and device thereof | |
CN103270175B (en) | Method and system for detecting the insertion sites of transgenic foreign fragments | |
CN111292806B (en) | Transcriptome analysis method by using nanopore sequencing | |
CN112825268B (en) | Sequencing result comparison method and application thereof | |
CN112750501A (en) | Optimized analysis method for macrovirome process | |
CN116469462B (en) | Ultra-low frequency DNA mutation identification method and device based on double sequencing | |
CN108595914B (en) | High-precision prediction method for tobacco mitochondrial RNA editing sites | |
CN111445949A (en) | Method for annotating genome of high-altitude polyploid fish by using nanopore sequencing data | |
CN106021998A (en) | Computation pipeline of single-pass multiple variant calls | |
CN112259169B (en) | Method for rapidly obtaining chloroplast genome from transcriptome data | |
Natsidis et al. | Systematic errors in orthology inference: a bug or a feature for evolutionary analyses? | |
Armstrong | Enabling comparative genomics at the scale of hundreds of species |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |