CN111321209A - Method for double-end correction of circulating tumor DNA sequencing data - Google Patents
Method for double-end correction of circulating tumor DNA sequencing data Download PDFInfo
- Publication number
- CN111321209A CN111321209A CN202010220739.9A CN202010220739A CN111321209A CN 111321209 A CN111321209 A CN 111321209A CN 202010220739 A CN202010220739 A CN 202010220739A CN 111321209 A CN111321209 A CN 111321209A
- Authority
- CN
- China
- Prior art keywords
- sequencing
- overlap
- sequencing data
- double
- correction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Analytical Chemistry (AREA)
- Organic Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Genetics & Genomics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Pathology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Oncology (AREA)
- Hospice & Palliative Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for double-end correction of circulating tumor DNA sequencing data, which comprises cfDNA extraction, target capture library construction and sequencing; sequencing data quality control; the invention relates to a double-end correction step of sequencing data, which adopts a double-end sequencing method to simultaneously sequence two ends of the same DNA segment from positive and negative directions, and carries out base correction of sequence overlapping regions on ctDNA high-throughput sequencing data, so that the sequencing error rate can be reduced according to the consistent characteristics of the overlapping region sequences of double-end sequencing, the ctDNA gene mutation detection and analysis accuracy, particularly the low-abundance gene mutation detection accuracy, the false positive rate is reduced, and the application value of the ctDNA detection in clinical treatment is improved.
Description
Technical Field
The invention belongs to the technical field of biology, and particularly relates to a method for double-end correction of circulating tumor DNA sequencing data.
Background
In recent years, the incidence and mortality of tumors continue to increase and the trend toward younger tumors has become one of the important factors that seriously threaten life health and cause high social burden. The 5-year survival rate of Chinese tumor patients is about 40 percent and far lags behind 60 percent of developed countries, so that the Chinese tumor prevention and treatment situation is very severe, and an effective method is urgently needed to improve the prevention, control, diagnosis and treatment efficiency of cancers and the survival rate of the patients.
Gene mutations have been shown to play an important role in the regulation of tumor cell growth and progression. Due to tumor heterogeneity and complex genetic mutations, conventional detection methods cannot accurately detect cancer-related genetic mutations. With the rapid development of high-throughput sequencing technology and computer technology, the high-throughput sequencing technology is adopted, and an optimized DNA separation and extraction technology, a target capture template technology and a biological information analysis technology are combined, so that the accurate detection and analysis of ultralow mutation can be realized, and a foundation is provided for the clinical wide application of accurate tumor treatment.
Circulating tumor DNA (ctDNA) is a small DNA fragment derived from tumor cells, has a length of about 170bp, is released from the tumor cells to peripheral blood circulation and then is cleaved to form endogenous single-stranded or double-stranded DNA, and carries molecular mutation information consistent with primary tumor tissues. A large number of studies show that ctDNA has high consistency with genome information of tumor tissues. Therefore, ctDNA assays may be used as a complement to clinical tissue sample gene assays or as a replacement in some cases.
Because the content of ctDNA in cfDNA (cell free DNA, cfDNA) is very low, and a part of samples is even lower than 0.1%, ctDNA gene mutation detection is more easily affected by various interference factors (DNA extraction, library construction, targeted capture technology, etc.). The sequencing error rate of the high-throughput sequencing technology is one thousandth, and sequencing errors of the order of magnitude have great influence on the accuracy of tumor gene mutation detection, especially on the detection of extremely low mutation abundance in ctDNA. Therefore, how to reduce the error rate of the sequencing result is a key technical link for ctDNA gene mutation detection. Double-end sequencing is adopted, and two ends of the same DNA fragment are sequenced from the positive direction and the negative direction, so that the mutual rectification effect can be realized to a certain extent; meanwhile, if the DNA fragments are shorter, overlapping regions exist in double-end sequencing, sequencing errors are corrected by using the overlapping regions, and the accuracy of ctDNA gene mutation detection can be improved.
Disclosure of Invention
The invention provides an improvement aiming at the defects of the prior art, provides a method for double-end correction of circulating tumor DNA sequencing data, is a method for performing base correction according to an overlapping region of cfDNA double-end sequencing, and is realized by the following technical scheme:
the invention discloses a method for double-end correction of circulating tumor DNA sequencing data, which comprises the following steps:
1) cfDNA extraction, target capture library construction and sequencing:
extracting cfDNA in sample plasma by using a magnetic bead method for sample library construction; adding sequencing adapters at two ends of cfDNA molecules, wherein the sequencing adapters contain 8bp tag sequences for off-line data splitting, performing hybridization capture by using a liquid phase molecular probe, capturing target DNA fragments, and completing library construction; sequencing the constructed library by using a high-throughput sequencer, wherein the sequencing read length is 150 bp;
2) sequencing data quality control:
splitting sequencing data of different samples sequenced in the step 1) according to different label sequences, and performing quality control on the split sequencing data;
3) double-end correction of sequencing data:
performing double-end data correction on the quality control qualified sample in the step 2), wherein the specific method is as follows;
a) performing reverse complementary conversion on the R2 sequence, searching the same initial positions of R1 and R2 by using Kmer, and judging whether overlap exists or not; r2 is a reverse sequencing sequence, Kmer is a nucleotide fragment which breaks the sequencing sequence into K in length by bp, R1 is a forward sequencing sequence, and overlap is a sequence overlapping region;
b) if the overlap exists, judging the positions of the overlap at the leftmost end and the rightmost end, namely the positions of the first same Kmer and the last same Kmer;
c) judging whether the overlap lengths of R1 and R2 are consistent, and discarding the two fragments if the overlap lengths are inconsistent;
d) judging the overlap length, setting a threshold value to be 40bp, and if the overlap length is smaller than the threshold value, not correcting;
e) correcting the wrong base in the overlap region, and if the correction quantity of the same overlap is more than 5, discarding the sequencing sequence of the segment;
4) using sequence comparison software to perform sequence comparison on the corrected sequencing data obtained in the step 3) and a standard human genome to generate a comparison result file;
5) performing gene mutation detection analysis on the result file obtained in the step 4) by using mutation detection software;
6) functional annotation of the gene mutation results of step 5) was performed using annotation software.
As a further improvement, the method for correcting the wrong base in the overlap region in step 3) of the invention is as follows:
when the sequencing quality value of R1R2 bases is more than or equal to 30, the bases at two positions are replaced by N;
when the sequencing quality value of one base of the R1R2 is more than 30, the other base is less than 30, and the bases less than 30 are replaced by the bases more than 30;
when the sequencing quality value of R1R2 bases is less than 30, the bases at two positions are replaced by N;
the above operation performs traversal correction on all the segments.
As a further improvement, the sample plasma in step 1) of the present invention is derived from human plasma.
As a further improvement, the high-throughput sequencer in step 1) is an Illumina nextseqCN500 sequencer, a BGISEQ-100 sequencer, a BGISEQ-1000 sequencer or a DA8600 sequencer.
As a further improvement, the sequencing mode in the step 1) is double-ended sequencing.
As a further improvement, the quality control is carried out on the split sequencing data by using fastqc software in the step 2).
As a further improvement, the software used for sequence alignment in step 4) of the present invention is BWA.
As a further improvement, the software used for mutation detection in step 5) of the present invention is varscan.
As a further improvement, the annotation software for gene mutation results in step 6) of the present invention is annovar.
The invention has the following beneficial effects:
according to the invention, a double-end sequencing method is adopted, two ends of the same DNA fragment are sequenced from the positive direction and the negative direction simultaneously, when the detected fragment is shorter, double-end sequencing can generate an overlapping region which is derived from the same DNA fragment, under the condition that no sequencing error exists, the sequences of the overlapping region of the double-end sequencing positive and negative sequencing are completely consistent, and an algorithm developed by utilizing the characteristic is used for correcting the sequencing error, so that the accuracy of gene mutation detection can be improved.
The length of the ctDNA is about 170bp, and double-end sequencing can generate an overlapping region.
ctDNA is very low in cfDNA, and even less than 0.1% in some samples, so ctDNA gene mutation detection is more susceptible to various interference factors. The method carries out sequence overlapping region base correction on ctDNA high-throughput sequencing data, can reduce the sequencing error rate according to the characteristic that the overlapping region sequences of double-end sequencing have consistency, effectively improves the accuracy of ctDNA gene mutation detection and analysis, particularly the accuracy of low-abundance gene mutation detection, reduces the false positive rate, and improves the application value of ctDNA detection in clinical treatment.
Drawings
FIG. 1: the main flow diagram of the scheme of the invention;
FIG. 2: the flow chart of step (3) of the invention is shown schematically.
Detailed Description
The technical solution of the present invention is further illustrated by the following specific examples:
(1) cfDNA extraction, target capture library construction and sequencing:
extracting cfDNA in sample plasma by using a magnetic bead method for sample library construction; the sample plasma is derived from human plasma;
adding sequencing adapters at two ends of cfDNA molecules, wherein the sequencing adapters contain 8bp tag sequences for off-line data splitting, performing hybridization capture by using a liquid phase molecular probe, capturing target DNA fragments, and completing library construction;
sequencing the constructed library by using a high-throughput sequencer, wherein the sequencing read length is 150bp, the double ends of the library are sequenced, and the high-throughput sequencer is an Illumina NextSeq CN500 sequencer, a BGISEQ-100 sequencer, a BGISEQ-1000 sequencer or a DA8600 sequencer;
(2) sequencing data quality control:
splitting sequencing data of different samples sequenced in the step (1) according to different label sequences, and performing quality control on the split sequencing data by using fastqc software;
(3) double-end correction of sequencing data:
performing double-end data correction on the QC qualified samples in the step (2), wherein the specific method is as follows;
a) and (3) performing reverse complementary transformation on the R2 (reverse sequencing sequence) sequence, searching the same initial positions of R1 (forward sequencing sequence) and R2 by using Kmer (breaking the sequencing sequence into nucleotide fragments with the length of K by bp), and judging whether overlapping (sequence overlapping region) exists.
b) And if the overlap exists, judging the positions of the overlap at the leftmost end and the rightmost end, namely the positions of the first same Kmer and the last same Kmer.
c) Judging whether the overlap lengths of R1 and R2 are consistent, if the overlap lengths are not consistent, discarding two fragments which otherwise affect false positive single-base insertion-deletion mutation in subsequent variation detection.
d) Judging the overlap length, setting the threshold value to be 40bp, and if the overlap length is smaller than the threshold value, not correcting.
e) Correcting the error base in the overlap region, if the correction quantity of the overlap of the same segment is more than 5, abandoning the sequencing sequence of the segment, and adopting the following correction method of the inconsistency of the corresponding base of the overlap of R1R 2:
1. when the sequencing quality values of R1R2 bases are both greater than or equal to 30, the bases at two positions are replaced by N.
2. When one base of R1R2 has a sequencing quality value of more than 30 and the other base is less than 30, the bases less than 30 are replaced by more than 30 bases.
3. When the sequencing quality values of R1R2 bases are both less than 30, the bases at two positions are replaced by N.
(4) The above operation performs traversal correction on all the segments. Performing sequence comparison on the corrected sequencing data obtained in the step (3) and a standard human genome by using BWA software to generate a comparison result file;
(5) performing gene mutation detection analysis on the result file obtained in the step (4) by using varscan software;
(6) functional annotation of the gene mutation results of step (5) was performed using annovar software.
(7)
By applying the technical scheme of the invention, 1 group of cfDNA standard substances with known mutation sites (8) and mutation frequency (0.5%) are analyzed, and the accuracy of the detection result is verified, wherein the specific process comprises the following steps:
(1) cfDNA extraction, target capture library construction and sequencing:
extracting and purifying the cfDNA of the standard substance by using a nucleic acid extraction kit of a commercial company, directly taking 30ng of purified cfDNA for constructing a sample library without interrupting the cfDNA;
adding sequencing connectors at two ends of a cfDNA molecule of 100-350 bp, wherein the sequencing connectors contain tag sequences of 8bp, the tag sequences are used for distinguishing data among a plurality of different samples, performing hybridization capture by using a liquid phase molecular probe, capturing target DNA fragments, and completing library construction;
finally, performing double-end sequencing on the constructed library by using an Illumina NextSeq CN500 sequencer, wherein the sequencing read length is 150 bp;
(2) sequencing data quality control:
splitting sequencing data of different samples sequenced in the step (1) according to different label sequences, and performing quality control on the split sequencing data by using fastqc software;
(3) double-end correction of sequencing data:
performing double-end data correction on the quality control qualified sample in the step (2), wherein the specific method is as follows;
a) and performing reverse complementary transformation on the R2 sequence, searching the same initial positions of R1 and R2 by using Kmer, and judging whether overlap exists.
b) And if the overlap exists, judging the positions of the overlap at the leftmost end and the rightmost end, namely the positions of the first same Kmer and the last same Kmer.
c) Judging whether the lengths of R1 and R2overlap are consistent, if not, discarding the two fragments, otherwise, the false positive single-base insertion deletion mutation in the subsequent variation detection is influenced. d) Judging the overlap length, setting the threshold value to be 40bp, and if the overlap length is smaller than the threshold value, not correcting
e) Correcting the error base in the overlap region, if the correction quantity of the overlap of the same segment is more than 5, abandoning the sequencing sequence of the segment, and adopting the following correction method of the inconsistency of the corresponding base of the overlap of R1R 2:
1. the sequencing quality value of R1R2 base is more than or equal to 30, and bases at two positions are replaced by N.
2. R1R2 has a sequencing quality value of more than 30 for one base, less than 30 for another base, and more than 30 for bases less than 30.
3. The sequencing quality values of R1R2 bases are both less than 30, and the bases at two positions are replaced by N.
All the segments are subjected to traversal correction through the operation;
(4) performing sequence comparison on the corrected sequencing data obtained in the step (3) and a standard human genome by using BWA software to generate a comparison result file;
(5) performing gene mutation detection on the result file obtained in the step (4) by using varscan software
Measuring and analyzing;
(6) functional annotation of the gene mutation results of step (5) was performed using annovar software.
The detection conditions of 8 known mutation sites in the mutation detection results are summarized, and as shown in table 1, 8 gene mutation sites are accurately detected in 3 HD778 samples. When the sequencing data are not subjected to double-end correction, 2 low-frequency false positive mutations are detected in each sample, and after the sequencing data are subjected to double-end correction, the low-frequency false positive mutations in each sample are not detected, so that the double-end correction method provided by the invention effectively improves the accuracy of low-frequency mutation detection.
Wherein, cfDNA: free DNA;
magnetic bead method: specifically adsorbing DNA by using magnetic beads;
sequencing quality values: the probability that the base is not correctly detected is measured, and the higher the sequencing quality value is, the better the sequencing quality is;
illumina NextSeq CN500, BGISEQ-100, BGISEQ-1000 and DA8600 are models of high-throughput sequencers;
double-end sequencing: sequencing both ends of the DNA fragment;
fastqc, BWA, varscan, and annovar are software names, no industry-wide chinese name exists in China, and are all directly described in english or by abbreviations.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the core technical features of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (9)
1. A method for paired end correction of circulating tumor DNA sequencing data, comprising the steps of:
1) cfDNA extraction, target capture library construction and sequencing:
extracting cfDNA in sample plasma by using a magnetic bead method for sample library construction; adding sequencing adapters at two ends of cfDNA molecules, wherein the sequencing adapters contain 8bp tag sequences for off-line data splitting, performing hybridization capture by using a liquid phase molecular probe, capturing target DNA fragments, and completing library construction; sequencing the constructed library by using a high-throughput sequencer, wherein the sequencing read length is 150 bp;
2) sequencing data quality control:
splitting sequencing data of different samples sequenced in the step 1) according to different label sequences, and performing quality control on the split sequencing data;
3) double-end correction of sequencing data:
performing double-end data correction on the quality control qualified sample in the step 2), wherein the specific method is as follows;
a) performing reverse complementary conversion on the R2 sequence, searching the same initial positions of R1 and R2 by using Kmer, and judging whether overlap exists or not; the R2 is a reverse sequencing sequence, the Kmer is a nucleotide fragment which breaks a sequencing sequence into K in length by bp, the R1 is a forward sequencing sequence, and the overlap is a sequence overlapping region;
b) if the overlap exists, judging the positions of the overlap at the leftmost end and the rightmost end, namely the positions of the first same Kmer and the last same Kmer;
c) judging whether the overlap lengths of R1 and R2 are consistent, and discarding the two fragments if the overlap lengths are inconsistent;
d) judging the overlap length, setting a threshold value to be 40bp, and if the overlap length is smaller than the threshold value, not correcting;
e) correcting the wrong base in the overlap region, and if the correction quantity of the same overlap is more than 5, discarding the sequencing sequence of the segment;
4) using sequence comparison software to perform sequence comparison on the corrected sequencing data obtained in the step 3) and a standard human genome to generate a comparison result file;
5) performing gene mutation detection analysis on the result file obtained in the step 4) by using mutation detection software;
6) functional annotation of the gene mutation results of step 5) was performed using annotation software.
2. The method for double-ended correction of circulating tumor DNA sequencing data according to claim 1, wherein the step 3) corrects the overlap region error bases as follows:
when the sequencing quality value of R1R2 bases is more than or equal to 30, the bases at two positions are replaced by N;
when the sequencing quality value of one base of the R1R2 is more than 30, the other base is less than 30, and the bases less than 30 are replaced by the bases more than 30;
when the sequencing quality value of R1R2 bases is less than 30, the bases at two positions are replaced by N;
the above operation performs traversal correction on all the segments.
3. The method for paired end correction of circulating tumor DNA sequencing data according to claim 1 or 2, wherein the sample plasma of step 1) is derived from human plasma.
4. The method for paired end correction of circulating tumor DNA sequencing data according to claim 1, wherein the high-throughput sequencer in step 1) is illumina nextseq CN500 sequencer, BGISEQ-100 sequencer, BGISEQ-1000 sequencer or DA8600 sequencer.
5. The method for paired end correction of circulating tumor DNA sequencing data according to claim 1, 2 or 4, wherein the sequencing mode in step 1) is paired end sequencing.
6. The bioinformatic processing method for circulating tumor DNA analysis according to claim 1, wherein the step 2) is performed by quality control of the resolved sequencing data using fastqc software.
7. The method for processing bioinformation for circulating tumor DNA analysis according to claim 1, wherein the software for sequence alignment in step 4) is BWA.
8. The bioinformatic processing method for circulating tumor DNA analysis according to claim 1, wherein the software used for mutation detection in step 5) is varscan.
9. The bioinformatic processing method for circulating tumor DNA analysis according to claim 1, wherein the gene mutation result annotation software in step 6) is annovar.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010220739.9A CN111321209A (en) | 2020-03-26 | 2020-03-26 | Method for double-end correction of circulating tumor DNA sequencing data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010220739.9A CN111321209A (en) | 2020-03-26 | 2020-03-26 | Method for double-end correction of circulating tumor DNA sequencing data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111321209A true CN111321209A (en) | 2020-06-23 |
Family
ID=71169626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010220739.9A Pending CN111321209A (en) | 2020-03-26 | 2020-03-26 | Method for double-end correction of circulating tumor DNA sequencing data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111321209A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112687339A (en) * | 2021-01-21 | 2021-04-20 | 深圳吉因加医学检验实验室 | Method and device for counting sequence errors in plasma DNA fragment sequencing data |
CN115083521A (en) * | 2022-07-22 | 2022-09-20 | 角井(北京)生物技术有限公司 | Method and system for identifying tumor cell group in single cell transcriptome sequencing data |
CN115831233A (en) * | 2023-02-07 | 2023-03-21 | 杭州联川基因诊断技术有限公司 | mTag-based targeted sequencing data preprocessing method, equipment and medium |
CN116356001A (en) * | 2023-02-07 | 2023-06-30 | 江苏先声医学诊断有限公司 | Dual background noise mutation removal method based on blood circulation tumor DNA |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107523563A (en) * | 2017-09-08 | 2017-12-29 | 杭州和壹基因科技有限公司 | A kind of Bioinformatics method for Circulating tumor DNA analysis |
CN108595918A (en) * | 2018-01-15 | 2018-09-28 | 臻和(北京)科技有限公司 | The processing method and processing device of Circulating tumor DNA repetitive sequence |
CN109762881A (en) * | 2019-01-31 | 2019-05-17 | 中山拓普基因科技有限公司 | It is a kind of for detecting the Bioinformatic methods in the ultralow frequency mutational site in tumor patient blood ctDNA |
-
2020
- 2020-03-26 CN CN202010220739.9A patent/CN111321209A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107523563A (en) * | 2017-09-08 | 2017-12-29 | 杭州和壹基因科技有限公司 | A kind of Bioinformatics method for Circulating tumor DNA analysis |
CN108595918A (en) * | 2018-01-15 | 2018-09-28 | 臻和(北京)科技有限公司 | The processing method and processing device of Circulating tumor DNA repetitive sequence |
CN109762881A (en) * | 2019-01-31 | 2019-05-17 | 中山拓普基因科技有限公司 | It is a kind of for detecting the Bioinformatic methods in the ultralow frequency mutational site in tumor patient blood ctDNA |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112687339A (en) * | 2021-01-21 | 2021-04-20 | 深圳吉因加医学检验实验室 | Method and device for counting sequence errors in plasma DNA fragment sequencing data |
CN115083521A (en) * | 2022-07-22 | 2022-09-20 | 角井(北京)生物技术有限公司 | Method and system for identifying tumor cell group in single cell transcriptome sequencing data |
CN115831233A (en) * | 2023-02-07 | 2023-03-21 | 杭州联川基因诊断技术有限公司 | mTag-based targeted sequencing data preprocessing method, equipment and medium |
CN115831233B (en) * | 2023-02-07 | 2023-05-16 | 杭州联川基因诊断技术有限公司 | Targeted sequencing data preprocessing method, equipment and medium based on mTag |
CN116356001A (en) * | 2023-02-07 | 2023-06-30 | 江苏先声医学诊断有限公司 | Dual background noise mutation removal method based on blood circulation tumor DNA |
CN116356001B (en) * | 2023-02-07 | 2023-12-15 | 江苏先声医学诊断有限公司 | Dual background noise mutation removal method based on blood circulation tumor DNA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111321209A (en) | Method for double-end correction of circulating tumor DNA sequencing data | |
CN105793689B (en) | Methods and systems for genotyping genetic samples | |
CN109767810B (en) | High-throughput sequencing data analysis method and device | |
CN106156543B (en) | A kind of tumour ctDNA information statistical method | |
CN113035273B (en) | Rapid and ultrahigh-sensitivity DNA fusion gene detection method | |
CN107267613B (en) | Sequencing data processing system and SMN gene detection system | |
CN113903401B (en) | ctDNA length-based analysis method and system | |
WO2016049878A1 (en) | Snp profiling-based parentage testing method and application | |
US20210375397A1 (en) | Methods and systems for determining fusion events | |
CN115083521B (en) | Method and system for identifying tumor cell group in single cell transcriptome sequencing data | |
Larson et al. | A clinician’s guide to bioinformatics for next-generation sequencing | |
CN112927755B (en) | Method and system for identifying cfDNA (cfDNA) variation source | |
BR112021006402A2 (en) | SEQUENCE-GRAPH BASED TOOL TO DETERMINE VARIATION IN SHORT TANDEM REPETITION REGIONS | |
Woerner et al. | Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers | |
CN109461473B (en) | Method and device for acquiring concentration of free DNA of fetus | |
CN108728515A (en) | A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods | |
CN116434843A (en) | Base sequencing quality assessment method | |
WO2020132628A1 (en) | Methods, compositions, and systems for improving recovery of nucleic acid molecules | |
CN110819700A (en) | Method for constructing small pulmonary nodule computer-aided detection model | |
KR20220071122A (en) | Method for Detecting Cancer and Predicting prognosis Using Nucleic Acid Fragment Ratio | |
WO2019129200A1 (en) | C-site extraction method and apparatus | |
Isakov et al. | Deep sequencing data analysis: challenges and solutions | |
CN115620809B (en) | Nanopore sequencing data analysis method and device, storage medium and application | |
KR20190017161A (en) | Method for increasing read data analysis accuracy in amplicon based NGS by using primer remover | |
US20170226588A1 (en) | Systems and methods for dna amplification with post-sequencing data filtering and cell isolation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |