CN111321209A - Method for double-end correction of circulating tumor DNA sequencing data - Google Patents

Method for double-end correction of circulating tumor DNA sequencing data Download PDF

Info

Publication number
CN111321209A
CN111321209A CN202010220739.9A CN202010220739A CN111321209A CN 111321209 A CN111321209 A CN 111321209A CN 202010220739 A CN202010220739 A CN 202010220739A CN 111321209 A CN111321209 A CN 111321209A
Authority
CN
China
Prior art keywords
sequencing
overlap
sequencing data
double
correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010220739.9A
Other languages
Chinese (zh)
Inventor
王军一
肖雯
叶可勇
闫楠
刘杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Heyi Gene Technology Co ltd
Original Assignee
Hangzhou Heyi Gene Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Heyi Gene Technology Co ltd filed Critical Hangzhou Heyi Gene Technology Co ltd
Priority to CN202010220739.9A priority Critical patent/CN111321209A/en
Publication of CN111321209A publication Critical patent/CN111321209A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Organic Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Pathology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Oncology (AREA)
  • Hospice & Palliative Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for double-end correction of circulating tumor DNA sequencing data, which comprises cfDNA extraction, target capture library construction and sequencing; sequencing data quality control; the invention relates to a double-end correction step of sequencing data, which adopts a double-end sequencing method to simultaneously sequence two ends of the same DNA segment from positive and negative directions, and carries out base correction of sequence overlapping regions on ctDNA high-throughput sequencing data, so that the sequencing error rate can be reduced according to the consistent characteristics of the overlapping region sequences of double-end sequencing, the ctDNA gene mutation detection and analysis accuracy, particularly the low-abundance gene mutation detection accuracy, the false positive rate is reduced, and the application value of the ctDNA detection in clinical treatment is improved.

Description

Method for double-end correction of circulating tumor DNA sequencing data
Technical Field
The invention belongs to the technical field of biology, and particularly relates to a method for double-end correction of circulating tumor DNA sequencing data.
Background
In recent years, the incidence and mortality of tumors continue to increase and the trend toward younger tumors has become one of the important factors that seriously threaten life health and cause high social burden. The 5-year survival rate of Chinese tumor patients is about 40 percent and far lags behind 60 percent of developed countries, so that the Chinese tumor prevention and treatment situation is very severe, and an effective method is urgently needed to improve the prevention, control, diagnosis and treatment efficiency of cancers and the survival rate of the patients.
Gene mutations have been shown to play an important role in the regulation of tumor cell growth and progression. Due to tumor heterogeneity and complex genetic mutations, conventional detection methods cannot accurately detect cancer-related genetic mutations. With the rapid development of high-throughput sequencing technology and computer technology, the high-throughput sequencing technology is adopted, and an optimized DNA separation and extraction technology, a target capture template technology and a biological information analysis technology are combined, so that the accurate detection and analysis of ultralow mutation can be realized, and a foundation is provided for the clinical wide application of accurate tumor treatment.
Circulating tumor DNA (ctDNA) is a small DNA fragment derived from tumor cells, has a length of about 170bp, is released from the tumor cells to peripheral blood circulation and then is cleaved to form endogenous single-stranded or double-stranded DNA, and carries molecular mutation information consistent with primary tumor tissues. A large number of studies show that ctDNA has high consistency with genome information of tumor tissues. Therefore, ctDNA assays may be used as a complement to clinical tissue sample gene assays or as a replacement in some cases.
Because the content of ctDNA in cfDNA (cell free DNA, cfDNA) is very low, and a part of samples is even lower than 0.1%, ctDNA gene mutation detection is more easily affected by various interference factors (DNA extraction, library construction, targeted capture technology, etc.). The sequencing error rate of the high-throughput sequencing technology is one thousandth, and sequencing errors of the order of magnitude have great influence on the accuracy of tumor gene mutation detection, especially on the detection of extremely low mutation abundance in ctDNA. Therefore, how to reduce the error rate of the sequencing result is a key technical link for ctDNA gene mutation detection. Double-end sequencing is adopted, and two ends of the same DNA fragment are sequenced from the positive direction and the negative direction, so that the mutual rectification effect can be realized to a certain extent; meanwhile, if the DNA fragments are shorter, overlapping regions exist in double-end sequencing, sequencing errors are corrected by using the overlapping regions, and the accuracy of ctDNA gene mutation detection can be improved.
Disclosure of Invention
The invention provides an improvement aiming at the defects of the prior art, provides a method for double-end correction of circulating tumor DNA sequencing data, is a method for performing base correction according to an overlapping region of cfDNA double-end sequencing, and is realized by the following technical scheme:
the invention discloses a method for double-end correction of circulating tumor DNA sequencing data, which comprises the following steps:
1) cfDNA extraction, target capture library construction and sequencing:
extracting cfDNA in sample plasma by using a magnetic bead method for sample library construction; adding sequencing adapters at two ends of cfDNA molecules, wherein the sequencing adapters contain 8bp tag sequences for off-line data splitting, performing hybridization capture by using a liquid phase molecular probe, capturing target DNA fragments, and completing library construction; sequencing the constructed library by using a high-throughput sequencer, wherein the sequencing read length is 150 bp;
2) sequencing data quality control:
splitting sequencing data of different samples sequenced in the step 1) according to different label sequences, and performing quality control on the split sequencing data;
3) double-end correction of sequencing data:
performing double-end data correction on the quality control qualified sample in the step 2), wherein the specific method is as follows;
a) performing reverse complementary conversion on the R2 sequence, searching the same initial positions of R1 and R2 by using Kmer, and judging whether overlap exists or not; r2 is a reverse sequencing sequence, Kmer is a nucleotide fragment which breaks the sequencing sequence into K in length by bp, R1 is a forward sequencing sequence, and overlap is a sequence overlapping region;
b) if the overlap exists, judging the positions of the overlap at the leftmost end and the rightmost end, namely the positions of the first same Kmer and the last same Kmer;
c) judging whether the overlap lengths of R1 and R2 are consistent, and discarding the two fragments if the overlap lengths are inconsistent;
d) judging the overlap length, setting a threshold value to be 40bp, and if the overlap length is smaller than the threshold value, not correcting;
e) correcting the wrong base in the overlap region, and if the correction quantity of the same overlap is more than 5, discarding the sequencing sequence of the segment;
4) using sequence comparison software to perform sequence comparison on the corrected sequencing data obtained in the step 3) and a standard human genome to generate a comparison result file;
5) performing gene mutation detection analysis on the result file obtained in the step 4) by using mutation detection software;
6) functional annotation of the gene mutation results of step 5) was performed using annotation software.
As a further improvement, the method for correcting the wrong base in the overlap region in step 3) of the invention is as follows:
when the sequencing quality value of R1R2 bases is more than or equal to 30, the bases at two positions are replaced by N;
when the sequencing quality value of one base of the R1R2 is more than 30, the other base is less than 30, and the bases less than 30 are replaced by the bases more than 30;
when the sequencing quality value of R1R2 bases is less than 30, the bases at two positions are replaced by N;
the above operation performs traversal correction on all the segments.
As a further improvement, the sample plasma in step 1) of the present invention is derived from human plasma.
As a further improvement, the high-throughput sequencer in step 1) is an Illumina nextseqCN500 sequencer, a BGISEQ-100 sequencer, a BGISEQ-1000 sequencer or a DA8600 sequencer.
As a further improvement, the sequencing mode in the step 1) is double-ended sequencing.
As a further improvement, the quality control is carried out on the split sequencing data by using fastqc software in the step 2).
As a further improvement, the software used for sequence alignment in step 4) of the present invention is BWA.
As a further improvement, the software used for mutation detection in step 5) of the present invention is varscan.
As a further improvement, the annotation software for gene mutation results in step 6) of the present invention is annovar.
The invention has the following beneficial effects:
according to the invention, a double-end sequencing method is adopted, two ends of the same DNA fragment are sequenced from the positive direction and the negative direction simultaneously, when the detected fragment is shorter, double-end sequencing can generate an overlapping region which is derived from the same DNA fragment, under the condition that no sequencing error exists, the sequences of the overlapping region of the double-end sequencing positive and negative sequencing are completely consistent, and an algorithm developed by utilizing the characteristic is used for correcting the sequencing error, so that the accuracy of gene mutation detection can be improved.
The length of the ctDNA is about 170bp, and double-end sequencing can generate an overlapping region.
ctDNA is very low in cfDNA, and even less than 0.1% in some samples, so ctDNA gene mutation detection is more susceptible to various interference factors. The method carries out sequence overlapping region base correction on ctDNA high-throughput sequencing data, can reduce the sequencing error rate according to the characteristic that the overlapping region sequences of double-end sequencing have consistency, effectively improves the accuracy of ctDNA gene mutation detection and analysis, particularly the accuracy of low-abundance gene mutation detection, reduces the false positive rate, and improves the application value of ctDNA detection in clinical treatment.
Drawings
FIG. 1: the main flow diagram of the scheme of the invention;
FIG. 2: the flow chart of step (3) of the invention is shown schematically.
Detailed Description
The technical solution of the present invention is further illustrated by the following specific examples:
(1) cfDNA extraction, target capture library construction and sequencing:
extracting cfDNA in sample plasma by using a magnetic bead method for sample library construction; the sample plasma is derived from human plasma;
adding sequencing adapters at two ends of cfDNA molecules, wherein the sequencing adapters contain 8bp tag sequences for off-line data splitting, performing hybridization capture by using a liquid phase molecular probe, capturing target DNA fragments, and completing library construction;
sequencing the constructed library by using a high-throughput sequencer, wherein the sequencing read length is 150bp, the double ends of the library are sequenced, and the high-throughput sequencer is an Illumina NextSeq CN500 sequencer, a BGISEQ-100 sequencer, a BGISEQ-1000 sequencer or a DA8600 sequencer;
(2) sequencing data quality control:
splitting sequencing data of different samples sequenced in the step (1) according to different label sequences, and performing quality control on the split sequencing data by using fastqc software;
(3) double-end correction of sequencing data:
performing double-end data correction on the QC qualified samples in the step (2), wherein the specific method is as follows;
a) and (3) performing reverse complementary transformation on the R2 (reverse sequencing sequence) sequence, searching the same initial positions of R1 (forward sequencing sequence) and R2 by using Kmer (breaking the sequencing sequence into nucleotide fragments with the length of K by bp), and judging whether overlapping (sequence overlapping region) exists.
b) And if the overlap exists, judging the positions of the overlap at the leftmost end and the rightmost end, namely the positions of the first same Kmer and the last same Kmer.
c) Judging whether the overlap lengths of R1 and R2 are consistent, if the overlap lengths are not consistent, discarding two fragments which otherwise affect false positive single-base insertion-deletion mutation in subsequent variation detection.
d) Judging the overlap length, setting the threshold value to be 40bp, and if the overlap length is smaller than the threshold value, not correcting.
e) Correcting the error base in the overlap region, if the correction quantity of the overlap of the same segment is more than 5, abandoning the sequencing sequence of the segment, and adopting the following correction method of the inconsistency of the corresponding base of the overlap of R1R 2:
1. when the sequencing quality values of R1R2 bases are both greater than or equal to 30, the bases at two positions are replaced by N.
2. When one base of R1R2 has a sequencing quality value of more than 30 and the other base is less than 30, the bases less than 30 are replaced by more than 30 bases.
3. When the sequencing quality values of R1R2 bases are both less than 30, the bases at two positions are replaced by N.
(4) The above operation performs traversal correction on all the segments. Performing sequence comparison on the corrected sequencing data obtained in the step (3) and a standard human genome by using BWA software to generate a comparison result file;
(5) performing gene mutation detection analysis on the result file obtained in the step (4) by using varscan software;
(6) functional annotation of the gene mutation results of step (5) was performed using annovar software.
(7)
By applying the technical scheme of the invention, 1 group of cfDNA standard substances with known mutation sites (8) and mutation frequency (0.5%) are analyzed, and the accuracy of the detection result is verified, wherein the specific process comprises the following steps:
(1) cfDNA extraction, target capture library construction and sequencing:
extracting and purifying the cfDNA of the standard substance by using a nucleic acid extraction kit of a commercial company, directly taking 30ng of purified cfDNA for constructing a sample library without interrupting the cfDNA;
adding sequencing connectors at two ends of a cfDNA molecule of 100-350 bp, wherein the sequencing connectors contain tag sequences of 8bp, the tag sequences are used for distinguishing data among a plurality of different samples, performing hybridization capture by using a liquid phase molecular probe, capturing target DNA fragments, and completing library construction;
finally, performing double-end sequencing on the constructed library by using an Illumina NextSeq CN500 sequencer, wherein the sequencing read length is 150 bp;
(2) sequencing data quality control:
splitting sequencing data of different samples sequenced in the step (1) according to different label sequences, and performing quality control on the split sequencing data by using fastqc software;
(3) double-end correction of sequencing data:
performing double-end data correction on the quality control qualified sample in the step (2), wherein the specific method is as follows;
a) and performing reverse complementary transformation on the R2 sequence, searching the same initial positions of R1 and R2 by using Kmer, and judging whether overlap exists.
b) And if the overlap exists, judging the positions of the overlap at the leftmost end and the rightmost end, namely the positions of the first same Kmer and the last same Kmer.
c) Judging whether the lengths of R1 and R2overlap are consistent, if not, discarding the two fragments, otherwise, the false positive single-base insertion deletion mutation in the subsequent variation detection is influenced. d) Judging the overlap length, setting the threshold value to be 40bp, and if the overlap length is smaller than the threshold value, not correcting
e) Correcting the error base in the overlap region, if the correction quantity of the overlap of the same segment is more than 5, abandoning the sequencing sequence of the segment, and adopting the following correction method of the inconsistency of the corresponding base of the overlap of R1R 2:
1. the sequencing quality value of R1R2 base is more than or equal to 30, and bases at two positions are replaced by N.
2. R1R2 has a sequencing quality value of more than 30 for one base, less than 30 for another base, and more than 30 for bases less than 30.
3. The sequencing quality values of R1R2 bases are both less than 30, and the bases at two positions are replaced by N.
All the segments are subjected to traversal correction through the operation;
(4) performing sequence comparison on the corrected sequencing data obtained in the step (3) and a standard human genome by using BWA software to generate a comparison result file;
(5) performing gene mutation detection on the result file obtained in the step (4) by using varscan software
Measuring and analyzing;
(6) functional annotation of the gene mutation results of step (5) was performed using annovar software.
The detection conditions of 8 known mutation sites in the mutation detection results are summarized, and as shown in table 1, 8 gene mutation sites are accurately detected in 3 HD778 samples. When the sequencing data are not subjected to double-end correction, 2 low-frequency false positive mutations are detected in each sample, and after the sequencing data are subjected to double-end correction, the low-frequency false positive mutations in each sample are not detected, so that the double-end correction method provided by the invention effectively improves the accuracy of low-frequency mutation detection.
Figure BDA0002425979440000071
Wherein, cfDNA: free DNA;
magnetic bead method: specifically adsorbing DNA by using magnetic beads;
sequencing quality values: the probability that the base is not correctly detected is measured, and the higher the sequencing quality value is, the better the sequencing quality is;
illumina NextSeq CN500, BGISEQ-100, BGISEQ-1000 and DA8600 are models of high-throughput sequencers;
double-end sequencing: sequencing both ends of the DNA fragment;
fastqc, BWA, varscan, and annovar are software names, no industry-wide chinese name exists in China, and are all directly described in english or by abbreviations.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the core technical features of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (9)

1. A method for paired end correction of circulating tumor DNA sequencing data, comprising the steps of:
1) cfDNA extraction, target capture library construction and sequencing:
extracting cfDNA in sample plasma by using a magnetic bead method for sample library construction; adding sequencing adapters at two ends of cfDNA molecules, wherein the sequencing adapters contain 8bp tag sequences for off-line data splitting, performing hybridization capture by using a liquid phase molecular probe, capturing target DNA fragments, and completing library construction; sequencing the constructed library by using a high-throughput sequencer, wherein the sequencing read length is 150 bp;
2) sequencing data quality control:
splitting sequencing data of different samples sequenced in the step 1) according to different label sequences, and performing quality control on the split sequencing data;
3) double-end correction of sequencing data:
performing double-end data correction on the quality control qualified sample in the step 2), wherein the specific method is as follows;
a) performing reverse complementary conversion on the R2 sequence, searching the same initial positions of R1 and R2 by using Kmer, and judging whether overlap exists or not; the R2 is a reverse sequencing sequence, the Kmer is a nucleotide fragment which breaks a sequencing sequence into K in length by bp, the R1 is a forward sequencing sequence, and the overlap is a sequence overlapping region;
b) if the overlap exists, judging the positions of the overlap at the leftmost end and the rightmost end, namely the positions of the first same Kmer and the last same Kmer;
c) judging whether the overlap lengths of R1 and R2 are consistent, and discarding the two fragments if the overlap lengths are inconsistent;
d) judging the overlap length, setting a threshold value to be 40bp, and if the overlap length is smaller than the threshold value, not correcting;
e) correcting the wrong base in the overlap region, and if the correction quantity of the same overlap is more than 5, discarding the sequencing sequence of the segment;
4) using sequence comparison software to perform sequence comparison on the corrected sequencing data obtained in the step 3) and a standard human genome to generate a comparison result file;
5) performing gene mutation detection analysis on the result file obtained in the step 4) by using mutation detection software;
6) functional annotation of the gene mutation results of step 5) was performed using annotation software.
2. The method for double-ended correction of circulating tumor DNA sequencing data according to claim 1, wherein the step 3) corrects the overlap region error bases as follows:
when the sequencing quality value of R1R2 bases is more than or equal to 30, the bases at two positions are replaced by N;
when the sequencing quality value of one base of the R1R2 is more than 30, the other base is less than 30, and the bases less than 30 are replaced by the bases more than 30;
when the sequencing quality value of R1R2 bases is less than 30, the bases at two positions are replaced by N;
the above operation performs traversal correction on all the segments.
3. The method for paired end correction of circulating tumor DNA sequencing data according to claim 1 or 2, wherein the sample plasma of step 1) is derived from human plasma.
4. The method for paired end correction of circulating tumor DNA sequencing data according to claim 1, wherein the high-throughput sequencer in step 1) is illumina nextseq CN500 sequencer, BGISEQ-100 sequencer, BGISEQ-1000 sequencer or DA8600 sequencer.
5. The method for paired end correction of circulating tumor DNA sequencing data according to claim 1, 2 or 4, wherein the sequencing mode in step 1) is paired end sequencing.
6. The bioinformatic processing method for circulating tumor DNA analysis according to claim 1, wherein the step 2) is performed by quality control of the resolved sequencing data using fastqc software.
7. The method for processing bioinformation for circulating tumor DNA analysis according to claim 1, wherein the software for sequence alignment in step 4) is BWA.
8. The bioinformatic processing method for circulating tumor DNA analysis according to claim 1, wherein the software used for mutation detection in step 5) is varscan.
9. The bioinformatic processing method for circulating tumor DNA analysis according to claim 1, wherein the gene mutation result annotation software in step 6) is annovar.
CN202010220739.9A 2020-03-26 2020-03-26 Method for double-end correction of circulating tumor DNA sequencing data Pending CN111321209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010220739.9A CN111321209A (en) 2020-03-26 2020-03-26 Method for double-end correction of circulating tumor DNA sequencing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010220739.9A CN111321209A (en) 2020-03-26 2020-03-26 Method for double-end correction of circulating tumor DNA sequencing data

Publications (1)

Publication Number Publication Date
CN111321209A true CN111321209A (en) 2020-06-23

Family

ID=71169626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010220739.9A Pending CN111321209A (en) 2020-03-26 2020-03-26 Method for double-end correction of circulating tumor DNA sequencing data

Country Status (1)

Country Link
CN (1) CN111321209A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112687339A (en) * 2021-01-21 2021-04-20 深圳吉因加医学检验实验室 Method and device for counting sequence errors in plasma DNA fragment sequencing data
CN115083521A (en) * 2022-07-22 2022-09-20 角井(北京)生物技术有限公司 Method and system for identifying tumor cell group in single cell transcriptome sequencing data
CN115831233A (en) * 2023-02-07 2023-03-21 杭州联川基因诊断技术有限公司 mTag-based targeted sequencing data preprocessing method, equipment and medium
CN116356001A (en) * 2023-02-07 2023-06-30 江苏先声医学诊断有限公司 Dual background noise mutation removal method based on blood circulation tumor DNA

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107523563A (en) * 2017-09-08 2017-12-29 杭州和壹基因科技有限公司 A kind of Bioinformatics method for Circulating tumor DNA analysis
CN108595918A (en) * 2018-01-15 2018-09-28 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
CN109762881A (en) * 2019-01-31 2019-05-17 中山拓普基因科技有限公司 It is a kind of for detecting the Bioinformatic methods in the ultralow frequency mutational site in tumor patient blood ctDNA

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107523563A (en) * 2017-09-08 2017-12-29 杭州和壹基因科技有限公司 A kind of Bioinformatics method for Circulating tumor DNA analysis
CN108595918A (en) * 2018-01-15 2018-09-28 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
CN109762881A (en) * 2019-01-31 2019-05-17 中山拓普基因科技有限公司 It is a kind of for detecting the Bioinformatic methods in the ultralow frequency mutational site in tumor patient blood ctDNA

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112687339A (en) * 2021-01-21 2021-04-20 深圳吉因加医学检验实验室 Method and device for counting sequence errors in plasma DNA fragment sequencing data
CN115083521A (en) * 2022-07-22 2022-09-20 角井(北京)生物技术有限公司 Method and system for identifying tumor cell group in single cell transcriptome sequencing data
CN115831233A (en) * 2023-02-07 2023-03-21 杭州联川基因诊断技术有限公司 mTag-based targeted sequencing data preprocessing method, equipment and medium
CN115831233B (en) * 2023-02-07 2023-05-16 杭州联川基因诊断技术有限公司 Targeted sequencing data preprocessing method, equipment and medium based on mTag
CN116356001A (en) * 2023-02-07 2023-06-30 江苏先声医学诊断有限公司 Dual background noise mutation removal method based on blood circulation tumor DNA
CN116356001B (en) * 2023-02-07 2023-12-15 江苏先声医学诊断有限公司 Dual background noise mutation removal method based on blood circulation tumor DNA

Similar Documents

Publication Publication Date Title
CN111321209A (en) Method for double-end correction of circulating tumor DNA sequencing data
CN105793689B (en) Methods and systems for genotyping genetic samples
CN109767810B (en) High-throughput sequencing data analysis method and device
CN106156543B (en) A kind of tumour ctDNA information statistical method
CN113035273B (en) Rapid and ultrahigh-sensitivity DNA fusion gene detection method
CN107267613B (en) Sequencing data processing system and SMN gene detection system
CN113903401B (en) ctDNA length-based analysis method and system
WO2016049878A1 (en) Snp profiling-based parentage testing method and application
US20210375397A1 (en) Methods and systems for determining fusion events
CN115083521B (en) Method and system for identifying tumor cell group in single cell transcriptome sequencing data
Larson et al. A clinician’s guide to bioinformatics for next-generation sequencing
CN112927755B (en) Method and system for identifying cfDNA (cfDNA) variation source
BR112021006402A2 (en) SEQUENCE-GRAPH BASED TOOL TO DETERMINE VARIATION IN SHORT TANDEM REPETITION REGIONS
Woerner et al. Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers
CN109461473B (en) Method and device for acquiring concentration of free DNA of fetus
CN108728515A (en) A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods
CN116434843A (en) Base sequencing quality assessment method
WO2020132628A1 (en) Methods, compositions, and systems for improving recovery of nucleic acid molecules
CN110819700A (en) Method for constructing small pulmonary nodule computer-aided detection model
KR20220071122A (en) Method for Detecting Cancer and Predicting prognosis Using Nucleic Acid Fragment Ratio
WO2019129200A1 (en) C-site extraction method and apparatus
Isakov et al. Deep sequencing data analysis: challenges and solutions
CN115620809B (en) Nanopore sequencing data analysis method and device, storage medium and application
KR20190017161A (en) Method for increasing read data analysis accuracy in amplicon based NGS by using primer remover
US20170226588A1 (en) Systems and methods for dna amplification with post-sequencing data filtering and cell isolation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination