CN112397150A - ctDNA methylation level prediction device and method based on target region capture sequencing - Google Patents
ctDNA methylation level prediction device and method based on target region capture sequencing Download PDFInfo
- Publication number
- CN112397150A CN112397150A CN202110072090.5A CN202110072090A CN112397150A CN 112397150 A CN112397150 A CN 112397150A CN 202110072090 A CN202110072090 A CN 202110072090A CN 112397150 A CN112397150 A CN 112397150A
- Authority
- CN
- China
- Prior art keywords
- file
- reads
- methylation level
- filtering
- bam
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007069 methylation reaction Methods 0.000 title claims abstract description 120
- 230000011987 methylation Effects 0.000 title claims abstract description 119
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 53
- 238000001914 filtration Methods 0.000 claims abstract description 81
- 238000006243 chemical reaction Methods 0.000 claims abstract description 76
- 108091029430 CpG site Proteins 0.000 claims abstract description 63
- 238000012545 processing Methods 0.000 claims abstract description 34
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 20
- 108020004414 DNA Proteins 0.000 claims description 40
- 238000004590 computer program Methods 0.000 claims description 18
- 238000003860 storage Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 239000000523 sample Substances 0.000 description 79
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 60
- 238000003752 polymerase chain reaction Methods 0.000 description 34
- 239000003153 chemical reaction reagent Substances 0.000 description 25
- 239000007788 liquid Substances 0.000 description 23
- 239000011324 bead Substances 0.000 description 19
- 239000011534 wash buffer Substances 0.000 description 14
- 239000006228 supernatant Substances 0.000 description 13
- 239000000203 mixture Substances 0.000 description 12
- 230000008569 process Effects 0.000 description 10
- 238000002156 mixing Methods 0.000 description 9
- 230000000007 visual effect Effects 0.000 description 8
- 230000007067 DNA methylation Effects 0.000 description 7
- 206010028980 Neoplasm Diseases 0.000 description 7
- 238000001514 detection method Methods 0.000 description 7
- 238000001179 sorption measurement Methods 0.000 description 7
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 6
- 238000002360 preparation method Methods 0.000 description 6
- 238000009396 hybridization Methods 0.000 description 5
- 238000007481 next generation sequencing Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 239000000243 solution Substances 0.000 description 5
- 229910021642 ultra pure water Inorganic materials 0.000 description 5
- 239000012498 ultrapure water Substances 0.000 description 5
- 239000002699 waste material Substances 0.000 description 5
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000011065 in-situ storage Methods 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 238000005406 washing Methods 0.000 description 4
- 230000003321 amplification Effects 0.000 description 3
- 239000000872 buffer Substances 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000005119 centrifugation Methods 0.000 description 3
- 210000000349 chromosome Anatomy 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000012165 high-throughput sequencing Methods 0.000 description 3
- 238000012164 methylation sequencing Methods 0.000 description 3
- 238000003199 nucleic acid amplification method Methods 0.000 description 3
- 210000002381 plasma Anatomy 0.000 description 3
- 239000000047 product Substances 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 2
- 108010067770 Endopeptidase K Proteins 0.000 description 2
- DBMJMQXJHONAFJ-UHFFFAOYSA-M Sodium laurylsulphate Chemical compound [Na+].CCCCCCCCCCCCOS([O-])(=O)=O DBMJMQXJHONAFJ-UHFFFAOYSA-M 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000012148 binding buffer Substances 0.000 description 2
- 238000007664 blowing Methods 0.000 description 2
- 235000019506 cigar Nutrition 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 229940104302 cytosine Drugs 0.000 description 2
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 2
- 238000010828 elution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 238000011534 incubation Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000011259 mixed solution Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- XEBWQGVWTUSTLN-UHFFFAOYSA-M phenylmercury acetate Chemical compound CC(=O)O[Hg]C1=CC=CC=C1 XEBWQGVWTUSTLN-UHFFFAOYSA-M 0.000 description 2
- 239000002953 phosphate buffered saline Substances 0.000 description 2
- 239000011541 reaction mixture Substances 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 210000004881 tumor cell Anatomy 0.000 description 2
- 229940035893 uracil Drugs 0.000 description 2
- 241001504639 Alcedo atthis Species 0.000 description 1
- LSNNMFCWUKXFEE-UHFFFAOYSA-M Bisulfite Chemical compound OS([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-M 0.000 description 1
- 206010058314 Dysplasia Diseases 0.000 description 1
- 208000026350 Inborn Genetic disease Diseases 0.000 description 1
- 108060004795 Methyltransferase Proteins 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 239000007984 Tris EDTA buffer Substances 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006907 apoptotic process Effects 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000017531 blood circulation Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000012149 elution buffer Substances 0.000 description 1
- 230000008995 epigenetic change Effects 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 208000016361 genetic disease Diseases 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 241000264288 mixed libraries Species 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000017074 necrotic cell death Effects 0.000 description 1
- 230000005868 ontogenesis Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 210000005259 peripheral blood Anatomy 0.000 description 1
- 239000011886 peripheral blood Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 239000013074 reference sample Substances 0.000 description 1
- 239000013558 reference substance Substances 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000007789 sealing Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000010257 thawing Methods 0.000 description 1
- 229940113082 thymine Drugs 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000005748 tumor development Effects 0.000 description 1
- 230000005740 tumor formation Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 239000012224 working solution Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Chemical & Material Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Analytical Chemistry (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a ctDNA methylation level prediction device and method based on target region capture sequencing, wherein the device comprises: the FASTQ file processing module is used for acquiring a FASTQ file for capturing and sequencing a ctDNA sample to be detected, and processing the FASTQ file to obtain a filtered FASTQ file; the comparison module of the sample to be tested is used for comparing the gene sequence in the obtained FASTQ file with the reference genome and removing duplication to obtain a corresponding Bam file; the reads horizontal filtering module is used for filtering the reads in the generated Bam file one by one according to the preset C-T conversion rate to obtain a filtered Bam file; and the methylation level prediction module is used for further filtering the Bam file according to the Bed file of the target area and the preset number of covered CpG sites in each read, and predicting the methylation level of the CpG sites according to the residual reads.
Description
Technical Field
The invention relates to the technical field of biomedicine, in particular to a ctDNA methylation level prediction device and method.
Background
Circulating tumor DNA (ctDNA) is a small fragment of DNA derived from tumor cell apoptosis and necrosis, and is released from tumor cells to peripheral blood circulation to form endogenous single-stranded or double-stranded DNA carrying molecular mutation information consistent with primary tumor tissue. Therefore, the ctDNA sample detection can be used as a substitute sample for clinical tissue sample gene detection.
Studies have shown that epigenetic changes are one of the most common molecular changes in tumor formation. DNA methylation is a widely studied epigenetic modification that plays an important role in regulating gene expression and the like. Generally, DNA methylation refers to the structure of 5-methylcytosine (5mC) added to the 5' C of cytosine by the action of DNA methyltransferase (DNMT) to form a methyl group. Research shows that DNA methylation is involved in cell activities such as cell differentiation and tissue-specific gene expression, and abnormal DNA methylation can cause diseases such as dysplasia and tumors. Therefore, DNA methylation is of great significance to both ontogeny and the mechanism of tumor development and development.
With the continuous development of the next-generation sequencing technology, the application of the second-generation sequencing technology in the fields of human genetic diseases and cancer diagnosis is more and more common, and the methylation sequencing of ctDNA has become an important means for researching the tumor occurrence and development mechanism. However, the human reference genome is 3G in size, and it is too costly to perform whole genome methylation sequencing, which results in a large data volume. Therefore, target region capture sequencing has become an ideal method in scientific research.
The current traditional quality detection process for DNA methylation capture data is generally: comparing the data in the FASTQ format with a human reference genome, reserving high-quality unique comparison reads, removing repeated reads, then evaluating the base content proportion, capture efficiency and sequencing depth of the reserved reads to obtain a Bam file of a ctDNA sample to be detected, and finally analyzing the Bam file by using third-party software to obtain methylation level data of the ctDNA sample to be detected at CpG sites (cytosine-phosphate-guanine sites, namely sites immediately following guanine after cytosine in a DNA sequence), and directly using the methylation level data in subsequent scientific research and analysis.
Bisulfite treatment is required during the above-described DNA methylation capture sequencing of the target region to convert all unmethylated cytosines (C) to uracil (U) and uracil to thymine (T) via PCR (polymerase chain reaction), a technique for amplifying a specific DNA fragment, but methylated cytosines are not altered during this process. It can be known that incomplete conversion of unmethylated cytosine is likely to occur in this process, and thus a prediction deviation occurs in the methylation level of the ctDNA sample to be detected. And because the content of ctDNA is very low, the methylation level of the ctDNA sample is more easily influenced by the C-T conversion rate, and the accuracy of the detection result is further influenced.
Disclosure of Invention
Aiming at the problems, the invention provides a ctDNA methylation level prediction device and method based on target region capture sequencing, which effectively overcome the defects of low accuracy, large data quality deviation and the like in the conventional ctDNA methylation level prediction.
The technical scheme provided by the invention is as follows:
in one aspect, the present invention provides a ctDNA methylation level prediction device based on target region capture sequencing, comprising:
the FASTQ file processing module is used for acquiring a FASTQ file for capturing and sequencing a ctDNA sample to be detected, and performing preprocessing operation on the FASTQ file to obtain a filtered FASTQ file;
the comparison module of the sample to be tested is used for comparing the gene sequence in the FASTQ file obtained by the FASTQ file processing module with the reference genome and removing duplication to obtain a corresponding Bam file;
the reads horizontal filtering module is used for filtering the reads in the Bam file generated by the to-be-detected sample comparison module one by one according to the preset C-T conversion rate to obtain a filtered Bam file;
and the methylation level prediction module is used for further filtering the Bam file output by the reads level filtering module according to the Bed file of the target area and the preset number of covered CpG sites in each read, and predicting the methylation level of the CpG sites according to the residual reads.
In this embodiment, FASTQ is a common type of high-throughput sequencing file. reads are the genome or transcriptome sequence fragments detected by a sequencer. According to the context of the methylated C base, the three types of CpG, CHG and CHH are divided, wherein H represents any one of bases except G base, namely A, C, T; the downstream of C where CpG is methylated is 1G base, CHG represents that 2 bases downstream of methylated C are H and G, CHH represents that two bases downstream of methylated C are both H, and CHG and CHH can be collectively called non CpG context. The Bam file is used to store the results of the sequencing sequence back-pasted to the reference genome. The C-T conversion rate is the ratio of C base to T base of non-CpG site in the original sequence.
Further preferably, in the FASTQ file processing module, the preprocessing operation performed on the acquired FASTQ file includes: removing the joints and low quality reads; and/or the presence of a gas in the gas,
in the comparison module of the sample to be tested, the gene sequence in the FASTQ file obtained by the FASTQ file processing module is respectively compared with the human reference genome and the internal reference lambda DNA reference genome and is subjected to de-duplication, and a Bam file of the human reference genome, a comparison report before de-duplication and a comparison report after de-duplication, and a Bam file of the internal reference lambda DNA reference genome, a comparison report before de-duplication and a comparison report after de-duplication are generated.
Further preferably, in the reads horizontal filtering module, the method includes:
the methylation number counting unit is used for reading reads in the Bam file generated by the to-be-detected sample comparison module line by line and counting the number of methylated and unmethylated bases under a non-CpG context mode;
a C-T conversion rate calculation unit for calculating the C-T conversion rate of each reads according to the sum of the number of non-CpG context bases which are methylated and the number of non-CpG context bases;
and the first filtering unit is used for filtering reads with the C-T conversion rate smaller than the preset C-T conversion rate in the Bam file to obtain a filtered Bam file.
Further preferably, the methylation level prediction module comprises:
the second filtering unit is used for filtering the known SNP sites in the dbSNP database and the SNP sites generated due to the specific variation reasons according to the target region Bed file to obtain the CpG sites of the ctDNA sample to be detected; and the device is used for further filtering the Bam file output by the reads horizontal filtering module according to the CpG sites obtained by filtering and the preset number of covered CpG sites in each read;
and the methylation level calculation unit is used for calculating the methylation level of the CpG sites according to the residual reads of the Bam file after the filtering of the second filtering unit.
In another aspect, the present invention provides a ctDNA methylation level prediction method based on target region capture sequencing, comprising:
acquiring a FASTQ file for capturing and sequencing a ctDNA sample to be detected, and carrying out pretreatment operation on the FASTQ file to obtain a filtered FASTQ file;
comparing the gene sequence in the obtained FASTQ file with a reference genome and removing duplication to obtain a corresponding Bam file;
filtering reads in the generated Bam file one by one according to a preset C-T conversion rate to obtain a filtered Bam file;
and further filtering the filtered Bam file according to the Bed file of the target area and the preset number of covered CpG sites in each read, and predicting the methylation level of the CpG sites according to the residual reads.
Further preferably, the obtaining of the FASTQ file for capturing and sequencing the ctDNA sample to be tested, and performing a preprocessing operation on the FASTQ file to obtain a filtered FASTQ file, includes: performing joint removal and low-quality reads operation on the acquired FASTQ file; and/or the presence of a gas in the gas,
comparing the gene sequence in the obtained FASTQ file with a reference genome and de-duplicating the gene sequence to obtain a corresponding Bam file, wherein the file comprises: and respectively comparing the gene sequences in the FASTQ file obtained by the FASTQ file processing module with a human reference genome and an internal reference lambda DNA reference genome and removing the duplication to generate a Bam file of the human reference genome, an alignment report before duplication removal and an alignment report after duplication removal, and an internal reference lambda DNA reference genome Bam file, an alignment report before duplication removal and an alignment report after duplication removal.
Further preferably, the filtering reads in the generated Bam file one by one according to a preset C-T conversion rate to obtain a filtered Bam file includes:
reading reads in the Bam file line by line, and counting the number of methylated and unmethylated bases under a non-CpG context mode;
calculating the C-T conversion rate of each reads according to the sum of the base number of the methylated non-CpG context and the base number of the methylated non-CpG context;
and filtering reads with the C-T conversion rate smaller than the preset C-T conversion rate in the Bam file to obtain a filtered Bam file.
Further preferably, the further filtering the filtered Bam file according to the target region Bed file and the preset number of covered CpG sites in each reads, and predicting the methylation level of CpG sites according to the remaining reads includes:
filtering the known SNP sites in the dbSNP database and the SNP sites generated due to specific variation reasons according to the Bed file of the target region to obtain CpG sites of the ctDNA sample to be detected;
further filtering the Bam file according to the CpG sites obtained by filtering and the preset number of covered CpG sites in each read;
the methylation level of CpG sites was calculated from the remaining reads of the filtered Bam file.
In another aspect, the present invention provides a terminal device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the above ctDNA methylation level prediction method based on target region capture sequencing.
In another aspect, the present invention provides a computer-readable storage medium storing a computer program, wherein the computer program is configured to, when executed by a processor, implement any of the above-mentioned steps of the ctDNA methylation level prediction method based on target region capture sequencing.
The ctDNA methylation level prediction device and method based on target region capture sequencing provided by the invention can at least bring the following beneficial effects:
1. on the basis of the traditional methylation data quality detection process, the influence of the C-T conversion rate on the subsequent prediction methylation level is considered, and the filtered methylation data is ensured to have higher reliability by using strict screening standards. Specifically, the C-T conversion rate of each reads is counted in consideration of the particularity of the ctDNA sample to be detected, and reads noise generated due to low C-T conversion rate is filtered, so that the reliability of methylation data is greatly improved, and a foundation is laid for the subsequent methylation level prediction.
2. Based on a common CpG locus methylation level prediction method, a stricter methylation level prediction standard is adopted, so that the methylation level prediction is more accurate. Specifically, the methylation state of reads covering CpG sites is considered, and reads with low reliability are filtered out, so that the methylation level prediction is more accurate, and a reliable data basis is provided for scientific research.
Drawings
The foregoing features, technical features, advantages and embodiments are further described in the following detailed description of the preferred embodiments, which is to be read in connection with the accompanying drawings.
FIG. 1 is a schematic diagram of the ctDNA methylation level prediction device based on target region capture sequencing according to the present invention;
FIG. 2 is a schematic flow chart of the ctDNA methylation level prediction method based on target region capture sequencing according to the present invention;
fig. 3 is a schematic structural diagram of a terminal device in the present invention.
Reference numerals:
the device comprises a 100-ctDNA methylation level prediction device, a 110-FASTQ file processing module, a 120-to-be-detected sample comparison module, a 130-reads level filtering module and a 140-methylation level prediction module.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
In the prior art, quality detection of methylation capture data of a target region is mainly focused on the comparison rate with a reference genome, the base distribution ratio, the base content ratio, the capture efficiency and the sequencing depth, the C-T conversion rate of a ctDNA sample to be detected is not considered, and due to the library establishment mode of methylation sequencing of the target region and the particularity of the ctDNA sample, the C-T conversion rate may cause incomplete conversion of unmethylated cytosine, so that a large prediction deviation is generated on the methylation level of the ctDNA sample. In addition, the current software for predicting the methylation level of CpG sites is uneven, most algorithms for predicting the methylation level of CpG sites focus on dividing the number of reads with methylation by the sum of the numbers of reads without methylation and with methylation, and do not consider the number and state of CpG sites contained in each read, which also causes the prediction deviation of the methylation level of ctDNA samples, and cannot guarantee the accuracy and reliability of data results, thereby affecting data interpretation. Based on the fact, the invention provides a brand-new ctDNA methylation level prediction device and method based on target area capture sequencing, the accuracy of methylation level prediction is improved, and reliable data basis is provided for scientific research.
A first embodiment of the present invention, as shown in fig. 1, is a ctDNA methylation level prediction apparatus 100 based on target region capture sequencing, comprising: the FASTQ file processing module 110 is used for acquiring a FASTQ file for capturing and sequencing a ctDNA sample to be detected, and performing preprocessing operation on the FASTQ file to obtain a filtered FASTQ file; a to-be-detected sample comparison module 120, configured to compare and deduplicate a gene sequence in the FASTQ file obtained by the FASTQ file processing module 110 with a reference genome to obtain a corresponding Bam file; the reads horizontal filtering module 130 is configured to filter reads in the Bam file generated by the to-be-detected sample comparison module 120 one by one according to a preset C-T conversion rate to obtain a filtered Bam file; and the methylation level prediction module 140 is configured to further filter the Bam file output by the reads level filtering module 130 according to the target area Bed file and the preset number of covered CpG sites in each reads, and predict the methylation level of the CpG sites according to the remaining reads.
In the ctDNA methylation level predicting apparatus 100, first, the FASTQ file processing module 110 performs operations of removing linkers and low-quality reads on the acquired FASTQ file to obtain FASTQ format data that does not include linkers and low-quality bases. Then, the to-be-detected sample comparison module 120 compares the gene sequence in the FASTQ file obtained by the FASTQ file processing module 110 with the reference genome and removes duplication, and retains high-quality and non-duplicated reads, so as to obtain a Bam file of the to-be-detected ctDNA sample. Then, a reads horizontal filtering module 130 evaluates the C-T conversion rate of non-CpG context of reads in the obtained Bam file, and filters out unqualified reads to obtain a Bam file which can be used for subsequent analysis; and finally, filtering and analyzing the Bam file by a methylation level prediction module 140 to obtain accurate methylation level data of the CpG sites of the ctDNA sample to be detected.
After the FASTQ file processing module 110 obtains a FASTQ file for capturing and sequencing a ctDNA sample to be tested, a connector and low-quality reads are removed by using a connector-removing software trimommatic to obtain a filtered FASTQ file, and statistical analysis is performed on the data amount, the base quality distribution and the base content ratio of the ctDNA sample to be tested by using FASTQC (quality control software for high-throughput sequencing data) software. Specifically, after the adaptor sequence is cleaved, bases having a base mass of less than 20 at the beginning and end of the remaining portion are cleaved, the average mass is calculated by windowing a window of 5 in size from the 5' end of the reads, and if the average base mass in the window is less than 20, the window is cleaved, and the number of bases remaining after the cleavage is required to exceed 75.
The to-be-detected sample comparison module 120 utilizes a genome comparison tool Bismark (a comparison method software for searching the position of a sequencing sequence in a gene reference sequence and outputting a Bam format result file) to compare and deduplicate the gene sequence in the FASTQ file obtained by the FASTQ file processing module 110 with a human reference genome and an internal reference lambda DNA reference genome respectively, so as to generate a Bam file of the human reference genome, a comparison report before deduplication and a comparison report after deduplication, and a Bam file of the internal reference lambda DNA reference genome, a comparison report before deduplication and a comparison report after deduplication; and sequencing and marking the aligned Bam files by utilizing SAMtools and Picard tools for repeated processing. In this process, the original data path of the ctDNA sample to be detected and the name of the ctDNA sample to be detected are input.
Inputs to the reads horizontal filter module 130 are the alignment of the ctDNA sample to be tested to the reference genome and the elimination of duplicate Bam file paths and the minimum requirement for non-CpG context C-T conversion rate. In the filtering process, firstly, reading reads in the Bam file generated by the to-be-detected sample comparison module 120 line by a methylation number statistical unit, and counting the number of methylated and unmethylated bases in a non-CpG context mode according to the actual base condition of a site of which the original sequence is C base in each read in the Bam file; then, the C-T conversion calculating unit calculates the C-T conversion of each reads based on the number of bases of non-CpG context in which methylation has occurred and the sum of the numbers of bases of non-CpG context (sum of the numbers of methylated and unmethylated bases); and finally, filtering reads with the C-T conversion rate smaller than the preset C-T conversion rate in the Bam file by the first filtering unit, so as to filter reads which do not meet the lowest requirement of the non-CpG context C-T conversion rate, and outputting the filtered Bam format file, the C-T conversion rate of the filtered ctDNA sample to be detected and the data volume of the reads of the filtered ctDNA sample to be detected.
The input of the methylation level prediction module 140 is the minimum requirement for covering CpG sites on the Bam file path, the target region Bed file and each reads obtained after the filtering of the reads level filtering module 130. In the prediction process, firstly, the second filtering unit filters known SNP sites in the dbSNP database and SNP sites generated due to specific variation reasons (such as structural variation, chromosome copy number variation and the like) according to the Bed file of the target region by using BisSNP software (a software for analyzing methylation data and can be used for identifying methylation sites and predicting methylation level) to obtain CpG sites of the ctDNA sample to be detected; then, further filtering the Bam file output by the reads horizontal filtering module 130 according to the CpG sites obtained by filtering and the preset number of covered CpG sites in each of the reads (i.e. the minimum requirement for covering CpG sites on each of the reads), and filtering out the reads which do not meet the minimum requirement for covering CpG sites; and finally, the methylation level calculation unit calculates the methylation level of the CpG sites according to the residual reads of the Bam file after the filtering of the second filtering unit, and the methylation level calculation formula of each CpG site is that the number of the reads covering the CpG sites and subjected to methylation meets the minimum requirement is divided by the number of all the reads covering the sites and the number of the reads meets the minimum requirement. Meanwhile, the Bam file filtered by the reads horizontal filtering module 130 is processed by using Bedtools software (a tool for processing a genome algorithm) in combination with the Bed file, so that the capture efficiency of the ctDNA sample to be detected is obtained; and (3) processing the filtered Bam file by utilizing SAMtools (a tool for processing the Bam/sam file) to obtain the sequencing depth of the ctDNA sample to be detected at each site of the target area, and counting data such as the average sequencing depth of the ctDNA sample to be detected.
In practical applications, the FASTQ file processing module 110, the to-be-detected sample comparison module 120, the reads level filtering module 130, and the methylation level prediction module 140 may be performed separately, that is, performed in an independent and modularized manner, or may be integrated together to automatically complete all processes. In an automated methylation data quality detection and methylation level prediction process: inputting at one time: FASTQ file and target region Bed file (containing three columns of information of chromosome, starting point and ending point) for methylation target capture sequencing of ctDNA sample to be detected. The output file includes: a statistical table of ctDNA sample data to be detected (including original base data volume, original reads data volume, filtered base data volume, filtered reads data volume, comparison to reference genome reads data volume and proportion, duplication eliminating data volume, Total C base content, methylated C base content, unmethylated C base content, methylated C base content in CpG context and non-CpG context, unmethylated C base content in CpG context and non-CpG context, C base content before reads horizontal filtering, C-T conversion rate of sample before reads horizontal filtering, lambda C-T conversion rate of internal reference sample, C-T conversion rate of sample after reads horizontal filtering, data volume after reads horizontal filtering, base number of target region, data volume and proportion of target region, base number and proportion of target region capture under different sequencing depths and average sequencing depth), And the methylation level of the CpG sites of the target region of the ctDNA sample to be detected (including five information of chromosomes, starting points, ending points, the methylation level and the sequencing depth).
Correspondingly, the invention also provides a ctDNA methylation level prediction method based on target region capture sequencing, which is applied to the ctDNA methylation level prediction device, as shown in fig. 2, and the ctDNA methylation level prediction method comprises the following steps: s10, acquiring a FASTQ file for capturing and sequencing a ctDNA sample to be detected, and preprocessing the FASTQ file to obtain a filtered FASTQ file; s20, comparing the gene sequence in the obtained FASTQ file with a reference genome and removing duplication to obtain a corresponding Bam file; s30, filtering reads in the generated Bam file one by one according to a preset C-T conversion rate to obtain a filtered Bam file; s40, further filtering the filtered Bam file according to the Bed file of the target area and the preset number of covered CpG sites in each reads, and predicting the methylation level of the CpG sites according to the residual reads.
Specifically, step S20 includes: and respectively comparing the gene sequences in the FASTQ file obtained by the FASTQ file processing module with the human reference genome and the internal reference lambda DNA reference genome and removing the duplication to generate a Bam file of the human reference genome, an alignment report before duplication removal and an alignment report after duplication removal, and an internal reference lambda DNA reference genome Bam file, an alignment report before duplication removal and an alignment report after duplication removal. Step S30 includes: reading reads in the Bam file line by line, and counting the number of methylated and unmethylated bases under a non-CpG context mode; calculating the C-T conversion rate of each reads according to the sum of the base number of the methylated non-CpG context and the base number of the methylated non-CpG context; and filtering reads with the C-T conversion rate smaller than the preset C-T conversion rate in the Bam file to obtain a filtered Bam file. Step S40 includes: filtering the known SNP sites in the dbSNP database and the SNP sites generated due to specific variation reasons according to the Bed file of the target region to obtain CpG sites of the ctDNA sample to be detected; further filtering the Bam file according to the CpG sites obtained by filtering and the preset number of covered CpG sites in each read; the methylation level of CpG sites was calculated from the remaining reads of the filtered Bam file.
The ctDNA methylation level prediction method based on target region capture sequencing and its beneficial effects are illustrated below by an example:
1. sample preparation
Selecting ctDNA samples of 6 tumor patients to carry out library construction, target region capture and sequencing, repeating the steps for 2 times for each patient, and respectively carrying out the following operations:
1.1 treating plasma
1.1.1 after thawing the samples were added 15. mu.L proteinase K (proteinase K) (20mg/mL) and 50. mu.L Sodium Dodecyl Sulfate (SDS) solution (20%) per 1mL of sample. If the plasma volume is less than 4mL, make up with Phosphate Buffered Saline (PBS). Turning over, mixing, incubating at 60 deg.C for 20min, and ice-cooling for 5 min.
1.1.2 reagents were added to the deep well plates, the reagents and corresponding amounts added in each deep well plate are shown in table 1:
table 1: list of reagents added in deep well plate
1.1.3 operating KingFisher FLEX magnetic bead extractor
Before the program runs, the clean magnetic head sleeve is placed at the designated position of the detection program, and the program runs to detect whether the magnetic head sleeve falls off or not. After the deep hole plate is added, an SATRT key on the automatic extraction instrument is clicked, and the magnetic head sleeve and the corresponding deep hole plate are sequentially placed according to the requirements of a display screen. The SATRT key is clicked again, and the automatic extractor starts to operate.
1.1.4 aspiration of DNA sample:
after the automatic extractor is operated, the No. 7 deep hole plate is taken out firstly, and then the STOP key is clicked. The DNA sample was aspirated into the corresponding labeled centrifuge tube with a pipette.
1.2 cfDNA library construction
1.2.1 preparation of internal reference
Adding Lamdba DNA into a 50uL breaking tube, breaking by using an M220 breaking instrument, diluting the broken internal reference DNA, and adding the diluted internal reference DNA into a sample during library building. Lamdba is a reference substance and is used for determining the transformation condition of the sample.
1.2.2 preparation of DNA samples
The extracted blood plasma of 6 tumor patients was divided into 2 parts in a total amount of 10ng, and an interrupted reference was added to prepare a library, where cfDNA samples were not interrupted. Sample operation information is shown in table 2.
Table 2: sample operation information List
1.3 library preparation procedure:
1.3.1 EZ DNA Methylation-LightningTMKit (manufactured by Zymo Research Co., Ltd.) for transforming DNA
1.3.1.1 sample start volume is 20. mu.L. When the amount is less than 20. mu.L, the amount is made up with water.
1.3.1.2A 130. mu.L of Lightning Conversion Reagent in the kit was added to the DNA sample, mixed by shaking, centrifuged briefly, placed on a PCR instrument, and subjected to PCR reaction under the conditions shown in Table 3.
Table 3: conditions of PCR reaction
1.3.1.3 Zymo-Spin in kitsTMAdding 600 μ L M-Binding Buffer in the kit into ICColumn, adding the product obtained by the above reaction into Zymo-Spin ™ IC Column containing M-Binding Buffer, blowing and mixing well with a gun, and standing for 2 min. Centrifuge at 12000rpm for 1 min.
1.3.1.4 adding the liquid in the collecting tube back to the adsorption column, standing for 2min, centrifuging at 12000rpm for 1min, and discarding the waste liquid.
1.3.1.5 Add 100. mu.L M-Wash Buffer in the kit, centrifuge at 12000rpm for 1min, discard the waste.
1.3.1.6 adding into 200 μ L-depletion Buffer in kit, incubating at room temperature (20-30 deg.C) for 15-20min, centrifuging at 12000rpm for 1min, and discarding waste liquid.
1.3.1.7M-Wash Buffer in 200. mu.L kit was added, centrifuged at 12000rpm for 1min, and the waste solution was discarded and repeated twice.
1.3.1.8 the adsorption column was returned to the collection tube, centrifuged at 12000rpm for 2min, and the waste liquid was decanted. And (4) opening the adsorption column, placing at room temperature for 2-5min to thoroughly dry the residual rinsing liquid in the adsorption material.
1.3.1.9 transferring the adsorption column into a clean centrifuge tube, suspending and dripping 20 μ L of elution buffer TE into the middle part of the adsorption membrane for elution, standing at room temperature for 2-5min, and centrifuging at 12000rpm for 1 min.
1.3.1.10 the liquid in the collection tube is added back to the adsorption column again, placed at room temperature for 2-5min, centrifuged at 12000rpm for 1min, and the tube with the DNA after transformation is stored at-20 deg.C (the DNA after transformation is used as soon as possible).
1.3.2 DNA pretreatment
1.3.2.1 PCR instrument was preheated in advance at 95 ℃ and the hot lid temperature was 105 ℃.
1.3.2.2 the transformed fragmented DNA was put into a 0.2ml PCR tube, and a Low concentration ethylenediaminetetraacetic acid TE buffer solution (Low EDTA TE) was added to dilute the total volume to 15. mu.L.
1.3.2.3 put the PCR tube into the PCR instrument, incubate at 95 ℃ for 2min, immediately put on ice, and stand for 2 min.
1.3.3 plus T7 Joint
1.3.3.1 the PCR instrument was preheated to 37 ℃ in advance and the hot lid temperature was 105 ℃.
1.3.3.2 the reaction system was prepared according to Table 4, in which the reagents were ACCEL-NGS METHYL YL-SEQ DNA LIBRARY KIT KITs (produced by Swift Biosciences).
Table 4: list of reagents
1.3.3.3 Add 25. mu.L of the reagent to the pre-treated DNA sample PCR tube placed on ice, pipette and mix well, and centrifuge instantaneously.
1.3.3.4 the PCR tube was set in a PCR machine and the reaction was carried out under the conditions shown in Table 5.
Table 5: reaction conditions
1.3.4 two-chain Synthesis reaction (Second strand synthesis reaction)
1.3.4.1 PCR instrument was preheated in advance at 98 ℃ and the hot lid temperature was 105 ℃.
1.3.4.2 reagents were prepared according to Table 6, from ACCEL-NGS METHYL-SEQ DNA LIBRARY KIT KIT (produced by Swift Biosciences).
Table 6: list of reagents
1.3.4.3 mu.L of the reagent shown in Table 6 was added to the reaction system in the previous step, and the mixture was pipetted and mixed well and centrifuged instantaneously.
1.3.4.4 the PCR tube was set in a PCR machine to perform the double strand synthesis reaction under the conditions shown in Table 7.
Table 7: reaction conditions for two-chain synthesis
1.3.4.5 the purified magnetic beads were removed from the reaction mixture at 4 ℃ and allowed to equilibrate at room temperature for half an hour.
1.3.4.6 after the reaction in the previous step, 101. mu.L of magnetic beads were added to the product, and the mixture was blown up and mixed.
1.3.4.7 standing at room temperature for 5min, placing on a magnetic frame until the liquid is clear, and discarding the supernatant.
1.3.4.8 was incubated with 200. mu.L of 80% ethanol for 30sec and discarded. The 80% ethanol is prepared in situ. The 200 μ L80% ethanol wash step was repeated once.
1.3.4.9 residual ethanol at the bottom of the centrifuge tube was discarded using a 10. mu.L pipette tip and dried at room temperature until ethanol was completely volatilized.
1.3.4.10 the tube was removed from the magnetic stand, 16. mu.L of ultrapure water was added, and the mixture was shaken and mixed. Incubate at room temperature for 2 min.
1.3.4.11 briefly, place on a magnetic rack until the liquid is clear, and transfer 15. mu.L of the sample to a new centrifuge tube.
1.3.5 plus T5 Joint
1.3.5.1 reagents were prepared according to Table 8, which were obtained from ACCEL-NGS METHYL-SEQ DNA LIBRARY KIT KIT (produced by Swift Biosciences). Adding 15 μ L of the reaction system into the sample in the previous step, blowing and mixing the mixture by using a pipette, and performing instantaneous centrifugation.
Table 8: list of reagents
1.3.5.2 the PCR tubes were placed in a PCR machine and the PCR reactions were performed according to the conditions of Table 9.
Table 9: conditions of PCR reaction
1.3.5.3 the purified beads were removed from 4 ℃ in advance and equilibrated at room temperature for half an hour.
1.3.5.4 after the ligation reaction was completed, 36. mu.L of magnetic beads were added, and the mixture was blown up and mixed.
Standing at room temperature for 5min at 1.3.5.5, placing on a magnetic frame until the liquid is clear, and discarding the supernatant.
1.3.5.6 was incubated with 200. mu.L of 80% ethanol for 30sec and discarded. The 80% ethanol is prepared in situ. The 200 μ L80% ethanol wash step was repeated once.
1.3.5.7 residual ethanol at the bottom of the centrifuge tube was discarded using a 10. mu.L pipette tip and dried at room temperature until ethanol was completely volatilized.
1.3.5.8 the tube was removed from the magnetic frame, 20. mu.L of ultrapure water was added, and the mixture was shaken and mixed. Incubate at room temperature for 2 min.
1.3.5.9 briefly, place on a magnetic rack until the liquid is clear, and transfer 20. mu.L of the sample to a new centrifuge tube.
1.3.6 amplification
1.3.6.1 configuring reaction reagents according to Table 10, adding 30 μ L of reaction system into the sample in the previous step, using a pipette to blow, uniformly mixing, and performing instant centrifugation, wherein the reagents in the table are from ACCEL-NGS METHYL-SEQ DNA LIBRARY KIT (produced by Swift Biosciences).
Table 10: list of reagents
1.3.6.2 the PCR tubes were placed in a PCR machine and the PCR reactions were performed according to the conditions of Table 11.
Table 11: conditions of PCR reaction
1.3.6.3 the purified magnetic beads were removed from the reaction mixture at 4 ℃ and allowed to equilibrate at room temperature for half an hour.
1.3.6.4 after the ligation reaction, 60. mu.L of magnetic beads were added and the mixture was pipetted and mixed.
1.3.6.5 standing at room temperature for 5min, placing on a magnetic frame until the liquid is clear, and discarding the supernatant.
1.3.6.6 was incubated with 200. mu.L of 80% ethanol for 30sec and discarded. The 80% ethanol is prepared in situ. The 200 μ L80% ethanol wash step was repeated once.
1.3.6.7 residual ethanol at the bottom of the centrifuge tube was discarded using a 10. mu.L pipette tip and dried at room temperature until ethanol was completely volatilized.
1.3.6.8 remove the tube from the magnetic frame, add 50. mu.L of ultrapure water, shake and mix. Incubate at room temperature for 2 min.
1.3.6.9 briefly, place on a magnetic rack until the liquid is clear, and transfer 50. mu.L of the sample to a new centrifuge tube.
1.4 library Capture
1.4.1 hybrid libraries:
capture was 1ug per total capture. Adding a hybridization reagent into the system, shaking and uniformly mixing, and centrifuging for a short time.
1.4.2 seal the EP tube with a sealing film, put into a vacuum centrifugal concentrator and evaporate to dryness (60 ℃, about 20min-1 hr). Note that it is checked at any time whether it has evaporated to dryness.
1.4.3 DNA denaturation:
after the samples were completely evaporated to dryness, 7.5. mu.L of 2 × Hybridization Buffer (via 5) and 3. mu.L of LHhybridization Component A (via 6) were added to each trap, mixed well with shaking, centrifuged briefly, and denatured at 95 ℃ for 10 min. Both reagents in this step were from SeqCap Hyb and Wash Kit kits (manufactured by Roche).
1.4.4 library hybridization to probes:
1.4.4.1 the probe was removed and centrifuged briefly.
1.4.4.2 the denatured DNA (always kept at 95 ℃) was quickly transferred to a PCR tube containing the probe by brief centrifugation, shaken and mixed well, and centrifuged briefly.
1.4.4.3 was placed in a PCR machine and hybridized at 47 ℃.
1.4.5 preparation of purification reagents
1.4.5.1A method for preparing the purified reagents required for capturing is shown in Table 12, and buffers were prepared according to the following table based on the number of captures. The reagents in the tables were SeqCap Hyb and Wash Kit kits (manufactured by Roche).
Table 12: list of formulated reagents to capture desired purification reagents
1.4.5.2 incubation of Capture Beads (Capture Beads) and Wash Buffer (Wash Buffer) working solution:
the l Capture Beads were allowed to equilibrate at room temperature for 30min before use.
l Wash Buffer used it was incubated at 47 ℃ for 2 hr.
1.4.6 post-hybridization purification
1.4.6.1 mu.L of each capture bead was dispensed, 100. mu.L of the capture beads were placed on a magnetic rack until the liquid was clarified, and the supernatant was discarded.
1.4.6.2 adding 200 μ L of 1 × Bead Wash Buffer (via 7), shaking, mixing, placing on magnetic frame until the liquid is clear, discarding the supernatant, and repeating twice. Add 100. mu.L of 1 × Bead Wash Buffer (visual 7) again, shake and mix well, put on the magnetic frame until the liquid is clear, discard the supernatant completely. The bead pretreatment was completed and the next assay was performed immediately.
1.4.6.3 transfer the captured overnight hybridization fluid into washed magnetic beads and pipette ten strokes. Placing in a PCR instrument, incubating at 47 ℃ for 45min (the temperature of a PCR hot cover is set as 57 ℃), and shaking once every 15min to ensure that the magnetic beads are suspended. 1 xBead Wash Buffer (visual 7) was obtained from SeqCap Hyb and Wash Kit (manufactured by Roche).
1.4.7 Using SeqCap Hyb and Wash Kit (manufactured by Roche Co.) for cleaning
1.4.7.1 after completion of incubation, 100. mu.L of 1 × Wash Buffer I (visual 1) pre-warmed at 47 ℃ was added to each tube and mixed by shaking. Placing on a magnetic frame until the liquid is clear, and discarding the supernatant. The reagents used in all of the steps through 1.4.7.6 were obtained from SeqCap Hyb and Wash Kit (manufactured by Roche).
1.4.7.2 mu.L of 1 × Stringent Wash Buffer (visual 4) preheated at 47 ℃ was added and mixed by pipetting ten times. Incubating at 47 deg.C for 5min, placing on magnetic frame until the liquid is clear, and discarding the supernatant.
1.4.7.3 mu.L of 1 × Stringent Wash Buffer (visual 4) preheated at 47 ℃ was added and mixed by pipetting ten times. Incubating at 47 deg.C for 5min, placing on magnetic frame until the liquid is clear, and discarding the supernatant.
1.4.7.4 mu.L of 1 × Wash Buffer I (visual 1) at room temperature was added, shaken for 2min, centrifuged briefly, placed on a magnetic stand until the liquid was clear, and the supernatant was discarded.
1.4.7.5 mu.L of 1 × Wash Buffer II (visual 2) placed at room temperature was added, shaken for 1min, centrifuged briefly, placed on a magnetic stand until the liquid was clear, and the supernatant was discarded.
1.4.7.6 mu.L of 1 × Wash Buffer III (visual 3) at room temperature was added, shaken for 30sec, centrifuged briefly, placed on a magnetic stand until the liquid was clear, and the supernatant was discarded.
1.4.7.7 and adding 36 μ L of ultrapure water into the centrifuge tube for elution, shaking and mixing uniformly, and carrying out the next amplification test.
1.4.8 PCR reaction
1.4.8.1 according to the capture number, preparing the mixed solution according to the table 13, shaking and mixing evenly. The reagents in the tables are all from SeqCap Hyb and Wash Kit kits (manufactured by Roche).
Table 13: preparation reagent list of mixed solution
1.4.8.2 were centrifuged briefly and the mixture was dispensed into PCR tubes at 30. mu.L/tube. Each captured sample was divided into two tubes for PCR amplification, with 20. mu.L of sample per tube.
1.4.8.3 the above samples were transferred to PCR reaction, shaken, mixed and centrifuged briefly.
1.4.8.4 was placed on a PCR machine and the PCR reaction was carried out under the conditions shown in Table 14.
Table 14: conditions of PCR reaction
1.4.9 purification after amplification
1.4.9.1 the purified magnetic beads are removed and allowed to equilibrate at room temperature for 30 min.
1.4.9.2 mu.L of purified magnetic beads was put into a 1.5mL centrifuge tube, 100. mu.L of the amplified capture DNA library was added, mixed well with shaking, and incubated at room temperature for 15 min.
1.4.9.3 were placed on a magnetic stand until the liquid was clear and the supernatant was discarded.
1.4.9.4 was incubated with 200. mu.L of 80% ethanol for 30sec and discarded. The 80% ethanol is prepared in situ.
The 200 μ L80% ethanol wash step was repeated once.
1.4.9.5 residual ethanol at the bottom of the centrifuge tube was discarded using a 10. mu.L pipette tip and dried at room temperature until ethanol was completely volatilized.
1.4.9.6 the tube was removed from the magnetic frame, 120. mu.L of ultrapure water was added, and the mixture was shaken and mixed. Incubate at room temperature for 2 min.
1.4.9.7 briefly, the sample was placed on a magnetic rack until the liquid was clear and the captured sample was transferred to a new centrifuge tube.
1.5 library pooling and sequencing
And calculating the quality of the mixed library for each capture according to the data volume proportion, and mixing different captures into one sample according to the data volume proportion. And adding a Phix library to mix into an upper machine sample, and sequencing. Phix is a phage that can improve base imbalance, and can be used as a reference to evaluate the sequencing quality.
Off-line FASTQ files are processed into input files usable by various modules and software
After the data is downloaded, the downloaded data is firstly processed into a Bam file from a FASTQ file, and the specific software and steps are as follows:
2.1 removing the joint
Calling Trimmomatic-0.36 to take each pair of FASTQ files as pairing sequences (paired reads) to carry out joint removal and low-quality base treatment, and generating FASTQ files after joint removal. Specifically, after the adaptor sequence is cleaved, bases having a base mass of less than 20 at the beginning and end of the remaining portion are cleaved, the average mass is calculated by windowing a window of 5 in size from the 5' end of the reads, and if the average base mass in the window is less than 20, the window is cleaved, and the number of bases remaining after the cleavage is required to exceed 75.
2.2 alignment
Call Bismark-v0.19.0 to align the adaptor-removed FASTQ file as paired reads to hg19 human reference genomic sequence and lambda DNA reference genomic sequence, generating an initial Bam file and alignment report.
2.3 De-weighting
And calling a default module of Bismark-v0.19.0, performing deduplication processing on the initial Bam file, and generating a deduplicated Bam file and a deduplicated result report.
2.4 ordering tags
Calling a sort module of SAMtools-1.3, sorting the duplicate-removed Bam files, and generating sorted Bam files; and calling an AddOrReplaceReadGroups module of Picard-2.1.0 (a tool for processing high-throughput sequencing data, which can be used for processing the result file of sam/Bam equal ratio), and marking and grouping the sequenced Bam files.
2.5 screening
Calling a clipOverlap module of the BamHIT-1.0.14 to screen the Bam files after the marks are grouped, and carrying out cigar value conversion processing on reads which overlap bases of the Bam files and pairing sequences and compare the pairing sequences to a negative strand of a reference sequence to generate the Bam files; and calling SAMtools-1.3 view to filter the alignment quality (used for quantifying the possibility of aligning to wrong positions, the higher the value is, the lower the possibility is), of the Bam file with the overlapped sequences removed, wherein the alignment quality is required to exceed 20, and a final Bam file is generated. The Cigar value reports the relative alignment information for each read in the Bam file.
2.6 building an index
And calling an index module of SAMtools-1.3 to establish an index for the finally generated Bam file, and generating a bai file paired with the finally generated Bam file.
2.7 data statistics
Calling FASTQC-0.11.3 to count the base data volume, reads data volume, base distribution and the like of FASTQ files before and after the connection; counting Total C base content, methylated C base content, unmethylated C base content, methylated C base content in CpG context and non-CpG context, unmethylated C base content in CpG context and non-CpG context in human reference genome comparison report generated in comparison process; calling an intersector module of Bedtools-v2.26.0 to count the number of bases in a target region in a finally generated Bam file, and the data volume and proportion captured by the target region; and calling SAMtools-1.3 to count the sequencing depth, the average sequencing depth and the number and proportion of the bases captured by the target region under different sequencing depths of the finally generated Bam file.
Direct identification of CpG methylation levels in ctDNA samples by conventional methods
And (3) processing the finally generated Bam file by using BisSNP software: firstly, calling BisulfiteCovatates and BisulfiteTableRecalibration modules of BisSNP-0.82.2 to perform base quality correction to generate a corrected csv file and a corrected Bam file; then, identifying SNP sites and CpG sites of a sample to be detected by using a BisulfisetGenotyper module and a target region Bed file to generate an original VCF file of the SNP and the CpG; and finally, calling a VCFpostprocess module to filter the CpG sites according to the generated VCF file to obtain the final CpG sites and the methylation level thereof.
Identification of CpG methylation levels in ctDNA samples Using the methods of the invention
Adopting the finally generated Bam file as an input file, taking the C-T conversion rate of non-CpG context as the minimum requirement, calling a reads horizontal filtering module of the invention, reading the Bam file line by line, judging whether the non-CpG context of each read meets the minimum requirement of the C-T conversion rate, screening reads meeting the requirement, and generating the filtered Bam file; and then, using the filtered Bam file and CpG sites identified by BisSNP-0.82.2 software as input files, requiring that each reads at least comprises 3 CpG sites, calling a methylation level prediction module, filtering reads which do not meet the requirements in the Bam file, and then calculating the methylation level of each CpG site.
Methylation level prediction comparing traditional and inventive methods
The methylation levels of 6 pairs of replicate samples were compared for inter-sample correlation using different methods for methylation level prediction, respectively, and the results were as follows:
5.1 different methods, the consistency of the prediction results of the CpG site methylation level which is less than 1 at the same time among the repeated samples is shown in Table 15, wherein, the Sample column of Table 15 shows the matched repeated samples for calculating the correlation, the non-C-T-BisSNP column shows that the C-T conversion rate is not filtered, and the BisSNP-0.82.2 software is used for calculating the methylation level, namely the correlation coefficient of the traditional method; the C-T-BisSNP column indicates the correlation coefficient of the method for calculating the methylation level by using BisSNP-0.82.2 software after C-T conversion rate filtration; the column C-T-estimate represents the correlation coefficient of the method of the invention.
Table 15: list of correlation coefficients (all sites) for methylation level prediction results for each replicate sample under different methods
5.2 different methods, the consistency of the prediction results of CpG site methylation level between repeated samples and site methylation level less than 0.02 is shown in Table 16, wherein, the Sample column of Table 16 shows the matched repeated samples used for calculating the correlation, the non-C-T-BisSNP column shows that the C-T conversion rate is not filtered, and the BisSNP-0.82.2 software is used for calculating the methylation level, namely the correlation coefficient of the traditional method; the C-T-BisSNP column indicates the correlation coefficient of the method for calculating the methylation level by using BisSNP-0.82.2 software after C-T conversion rate filtration; the column C-T-estimate represents the correlation coefficient of the method of the invention.
Table 16: list of correlation coefficients (low methylation level sites) for methylation level prediction results of each replicate sample under different methods
As can be seen from the table, compared with the non-C-T-BisSNP and C-T-BisSNP methods, the added reads horizontal filtering module and the methylation level prediction module in the invention improve the correlation of the hypomethylation level among repeated samples, and are more suitable for the methylation level prediction of ctDNA.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of program modules is illustrated, and in practical applications, the above-described distribution of functions may be performed by different program modules, that is, the internal structure of the apparatus may be divided into different program units or modules to perform all or part of the above-described functions. Each program module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one processing unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software program unit. In addition, the specific names of the program modules are only used for distinguishing the program modules from one another, and are not used for limiting the protection scope of the application.
Fig. 3 is a schematic structural diagram of a terminal device provided in an embodiment of the present invention, and as shown, the terminal device 200 includes: a processor 220, a memory 210, and a computer program 211 stored in the memory 210 and executable on the processor 220, such as: correlation programs were predicted based on ctDNA methylation levels of target region capture sequencing. Processor 220 implements the steps of the various ctDNA methylation level prediction method embodiments based on target region capture sequencing described above when executing computer program 211, or processor 220 implements the functions of the various modules of the ctDNA methylation level prediction apparatus embodiments based on target region capture sequencing described above when executing computer program 211.
The terminal device 200 may be a notebook, a palm computer, a tablet computer, a mobile phone, or the like. Terminal device 200 may include, but is not limited to, processor 220, memory 210. Those skilled in the art will appreciate that fig. 3 is merely an example of terminal device 200, does not constitute a limitation of terminal device 200, and may include more or fewer components than shown, or some components may be combined, or different components, such as: terminal device 200 may also include input-output devices, display devices, network access devices, buses, and the like.
The Processor 220 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor 220 may be a microprocessor or the processor may be any conventional processor or the like.
The memory 210 may be an internal storage unit of the terminal device 200, such as: a hard disk or a memory of the terminal device 200. The memory 210 may also be an external storage device of the terminal device 200, such as: a plug-in hard disk, an intelligent TF memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the terminal device 200. Further, the memory 210 may also include both an internal storage unit of the terminal device 200 and an external storage device. The memory 210 is used to store the computer program 211 and other programs and data required by the terminal device 200. The memory 210 may also be used to temporarily store data that has been output or is to be output.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or recited in detail in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described apparatus/terminal device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by sending instructions to relevant hardware through the computer program 211, where the computer program 211 may be stored in a computer readable storage medium, and when the computer program 211 is executed by the processor 220, the steps of the method embodiments may be implemented. Wherein the computer program 211 comprises: computer program code which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying the code of computer program 211, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the content of the computer readable storage medium can be increased or decreased according to the requirements of the legislation and patent practice in the jurisdiction, for example: in certain jurisdictions, in accordance with legislation and patent practice, the computer-readable medium does not include electrical carrier signals and telecommunications signals.
It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for persons skilled in the art, numerous modifications and adaptations can be made without departing from the principle of the present invention, and such modifications and adaptations should be considered as within the scope of the present invention.
Claims (10)
1. A ctDNA methylation level prediction apparatus based on target region capture sequencing, comprising:
the FASTQ file processing module is used for acquiring a FASTQ file for capturing and sequencing a ctDNA sample to be detected, and performing preprocessing operation on the FASTQ file to obtain a filtered FASTQ file;
the comparison module of the sample to be tested is used for comparing the gene sequence in the FASTQ file obtained by the FASTQ file processing module with the reference genome and removing duplication to obtain a corresponding Bam file;
the reads horizontal filtering module is used for filtering the reads in the Bam file generated by the to-be-detected sample comparison module one by one according to the preset C-T conversion rate to obtain a filtered Bam file;
and the methylation level prediction module is used for further filtering the Bam file output by the reads level filtering module according to the Bed file of the target area and the preset number of covered CpG sites in each read, and predicting the methylation level of the CpG sites according to the residual reads.
2. The ctDNA methylation level prediction device according to claim 1,
in the FASTQ file processing module, the preprocessing operation performed on the acquired FASTQ file includes: removing the joints and low quality reads; and/or the presence of a gas in the gas,
in the comparison module of the sample to be tested, the gene sequence in the FASTQ file obtained by the FASTQ file processing module is respectively compared with the human reference genome and the internal reference lambda DNA reference genome and is subjected to de-duplication, and a Bam file of the human reference genome, a comparison report before de-duplication and a comparison report after de-duplication, and a Bam file of the internal reference lambda DNA reference genome, a comparison report before de-duplication and a comparison report after de-duplication are generated.
3. The ctDNA methylation level prediction device according to claim 1 or 2 or, wherein in said reads level filtering module, comprises:
the methylation number counting unit is used for reading reads in the Bam file generated by the to-be-detected sample comparison module line by line and counting the number of methylated and unmethylated bases under a non-CpG context mode;
a C-T conversion rate calculation unit for calculating the C-T conversion rate of each reads according to the sum of the number of non-CpG context bases which are methylated and the number of non-CpG context bases;
and the first filtering unit is used for filtering reads with the C-T conversion rate smaller than the preset C-T conversion rate in the Bam file to obtain a filtered Bam file.
4. The ctDNA methylation level prediction device according to claim 1 or 2 or, wherein the methylation level prediction module comprises:
the second filtering unit is used for filtering the known SNP sites in the dbSNP database and the SNP sites generated due to the specific variation reasons according to the target region Bed file to obtain the CpG sites of the ctDNA sample to be detected; and the device is used for further filtering the Bam file output by the reads horizontal filtering module according to the CpG sites obtained by filtering and the preset number of covered CpG sites in each read;
and the methylation level calculation unit is used for calculating the methylation level of the CpG sites according to the residual reads of the Bam file after the filtering of the second filtering unit.
5. A ctDNA methylation level prediction method based on target region capture sequencing is characterized by comprising the following steps:
acquiring a FASTQ file for capturing and sequencing a ctDNA sample to be detected, and carrying out pretreatment operation on the FASTQ file to obtain a filtered FASTQ file;
comparing the gene sequence in the obtained FASTQ file with a reference genome and removing duplication to obtain a corresponding Bam file;
filtering reads in the generated Bam file one by one according to a preset C-T conversion rate to obtain a filtered Bam file;
and further filtering the filtered Bam file according to the Bed file of the target area and the preset number of covered CpG sites in each read, and predicting the methylation level of the CpG sites according to the residual reads.
6. The ctDNA methylation level prediction method of claim 5, wherein,
the method for obtaining the FASTQ file for capturing and sequencing the ctDNA sample to be detected and performing the preprocessing operation on the FASTQ file to obtain the filtered FASTQ file comprises the following steps: performing joint removal and low-quality reads operation on the acquired FASTQ file; and/or the presence of a gas in the gas,
comparing the gene sequence in the obtained FASTQ file with a reference genome and de-duplicating the gene sequence to obtain a corresponding Bam file, wherein the file comprises: and respectively comparing the gene sequences in the FASTQ file obtained by the FASTQ file processing module with a human reference genome and an internal reference lambda DNA reference genome and removing the duplication to generate a Bam file of the human reference genome, an alignment report before duplication removal and an alignment report after duplication removal, and an internal reference lambda DNA reference genome Bam file, an alignment report before duplication removal and an alignment report after duplication removal.
7. The ctDNA methylation level prediction method of claim 5 or 6, wherein the filtering reads in the generated Bam file item by item according to a predetermined C-T conversion rate to obtain a filtered Bam file comprises:
reading reads in the Bam file line by line, and counting the number of methylated and unmethylated bases under a non-CpG context mode;
calculating the C-T conversion rate of each reads according to the sum of the base number of the methylated non-CpG context and the base number of the methylated non-CpG context;
and filtering reads with the C-T conversion rate smaller than the preset C-T conversion rate in the Bam file to obtain a filtered Bam file.
8. The ctDNA methylation level prediction method of claim 5 or 6, wherein,
the step of further filtering the filtered Bam file according to the target area Bed file and the preset number of covered CpG sites in each reads, and predicting the methylation level of the CpG sites according to the residual reads comprises the following steps:
filtering the known SNP sites in the dbSNP database and the SNP sites generated due to specific variation reasons according to the Bed file of the target region to obtain CpG sites of the ctDNA sample to be detected;
further filtering the Bam file according to the CpG sites obtained by filtering and the preset number of covered CpG sites in each read;
the methylation level of CpG sites was calculated from the remaining reads of the filtered Bam file.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the ctDNA methylation level prediction method based on target region capture sequencing of any one of claims 5-8.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the ctDNA methylation level prediction method for target region capture based sequencing as claimed in any one of claims 5-8.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110072090.5A CN112397150B (en) | 2021-01-20 | 2021-01-20 | ctDNA methylation level prediction device and method based on target region capture sequencing |
PCT/CN2021/091761 WO2022156089A1 (en) | 2021-01-20 | 2021-04-30 | Dna methylation sequencing analysis methods |
EP21920475.7A EP4268231A4 (en) | 2021-01-20 | 2021-04-30 | Dna methylation sequencing analysis methods |
US17/490,549 US20220228209A1 (en) | 2021-01-20 | 2021-09-30 | Dna methylation sequencing analysis methods |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110072090.5A CN112397150B (en) | 2021-01-20 | 2021-01-20 | ctDNA methylation level prediction device and method based on target region capture sequencing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112397150A true CN112397150A (en) | 2021-02-23 |
CN112397150B CN112397150B (en) | 2021-04-20 |
Family
ID=74625183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110072090.5A Active CN112397150B (en) | 2021-01-20 | 2021-01-20 | ctDNA methylation level prediction device and method based on target region capture sequencing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112397150B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022156089A1 (en) * | 2021-01-20 | 2022-07-28 | Genecast Biotechnology Co., Ltd | Dna methylation sequencing analysis methods |
CN115064211A (en) * | 2022-08-15 | 2022-09-16 | 臻和(北京)生物科技有限公司 | ctDNA prediction method based on whole genome methylation sequencing and application thereof |
CN115910197A (en) * | 2021-12-29 | 2023-04-04 | 上海智峪生物科技有限公司 | Gene sequence processing method, gene sequence processing device, storage medium and electronic equipment |
WO2023184330A1 (en) * | 2022-03-31 | 2023-10-05 | 京东方科技集团股份有限公司 | Method and apparatus for processing genome methylation sequencing data, device, and medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160060622A1 (en) * | 2012-02-16 | 2016-03-03 | Cornell University | Methods and kit for characterizing the modified base status of a transcriptome |
CN108319813A (en) * | 2017-11-30 | 2018-07-24 | 臻和(北京)科技有限公司 | Circulating tumor DNA copies the detection method and device of number variation |
CN108319817A (en) * | 2018-01-15 | 2018-07-24 | 臻和(北京)科技有限公司 | The processing method and processing device of Circulating tumor DNA repetitive sequence |
CA3076894A1 (en) * | 2017-09-25 | 2019-03-28 | Memorial Sloan Kettering Cancer Center | Tumor mutational load and checkpoint immunotherapy |
CN109887548A (en) * | 2019-01-18 | 2019-06-14 | 臻悦生物科技江苏有限公司 | ctDNA ratio detection method and detection device based on capture sequencing |
US20190287654A1 (en) * | 2018-03-15 | 2019-09-19 | The Board Of Trustees Of The Leland Stanford Junior University | Methods Using Nucleic Acid Signals for Revealing Biological Attributes |
CN110603329A (en) * | 2017-03-02 | 2019-12-20 | 优美佳肿瘤技术有限公司 | Methylation markers for diagnosis of hepatocellular carcinoma and lung cancer |
WO2020165361A1 (en) * | 2019-02-14 | 2020-08-20 | Vib Vzw | Retrotransposon biomarkers |
CN112020563A (en) * | 2018-03-06 | 2020-12-01 | 癌症研究技术有限公司 | Improvements in variant detection |
CN112176419A (en) * | 2019-10-16 | 2021-01-05 | 中国医学科学院肿瘤医院 | Method for detecting variation and methylation of tumor specific genes in ctDNA |
-
2021
- 2021-01-20 CN CN202110072090.5A patent/CN112397150B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160060622A1 (en) * | 2012-02-16 | 2016-03-03 | Cornell University | Methods and kit for characterizing the modified base status of a transcriptome |
CN110603329A (en) * | 2017-03-02 | 2019-12-20 | 优美佳肿瘤技术有限公司 | Methylation markers for diagnosis of hepatocellular carcinoma and lung cancer |
CA3076894A1 (en) * | 2017-09-25 | 2019-03-28 | Memorial Sloan Kettering Cancer Center | Tumor mutational load and checkpoint immunotherapy |
CN108319813A (en) * | 2017-11-30 | 2018-07-24 | 臻和(北京)科技有限公司 | Circulating tumor DNA copies the detection method and device of number variation |
CN108319817A (en) * | 2018-01-15 | 2018-07-24 | 臻和(北京)科技有限公司 | The processing method and processing device of Circulating tumor DNA repetitive sequence |
CN112020563A (en) * | 2018-03-06 | 2020-12-01 | 癌症研究技术有限公司 | Improvements in variant detection |
US20190287654A1 (en) * | 2018-03-15 | 2019-09-19 | The Board Of Trustees Of The Leland Stanford Junior University | Methods Using Nucleic Acid Signals for Revealing Biological Attributes |
CN109887548A (en) * | 2019-01-18 | 2019-06-14 | 臻悦生物科技江苏有限公司 | ctDNA ratio detection method and detection device based on capture sequencing |
WO2020165361A1 (en) * | 2019-02-14 | 2020-08-20 | Vib Vzw | Retrotransposon biomarkers |
CN112176419A (en) * | 2019-10-16 | 2021-01-05 | 中国医学科学院肿瘤医院 | Method for detecting variation and methylation of tumor specific genes in ctDNA |
Non-Patent Citations (9)
Title |
---|
ADAM ZHANG等: "Urine as an Alternative to Blood for Cancer Liquid Biopsy and Precision Medicine", 《2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)》 * |
BABRAHAM BIOINFORMATICS: "《https://www.bioinformatics.babraham.ac.uk/projects/bismark/Bismark_User_Guide.pdf》", 14 January 2016 * |
CHENHUAN XU等: "Resolution of the DNA methylation state of single CpG dyads using in silico Strand Annealing and WGBS data", 《HHS AUTHOR MANUSCRIPTS》 * |
FAYOU WANG等: "Tumor purity and differential methylation in cancer epigenomics", 《BRIEFINGS IN FUNCTIONAL GENOMICS》 * |
王心蕊等: "抑癌基因PTEN突变在头颈部肿瘤中的作用", 《中国实验诊断学》 * |
肖稳: "DNA低频突变检测生信方法建立", 《中国优秀硕士学位论文全文数据库 基础科学辑》 * |
范昭璇等: "循环肿瘤DNA的检测:从数字化到测序", 《化学进展》 * |
裴志华: "基于染色体3D结构和关联分析解析植物复杂性状的遗传调控", 《中国优秀硕士学位论文全文数据库 基础科学辑》 * |
陈实富: "循环肿瘤DNA测序的数据分析方法", <中国博士学位论文全文数据库 信息科技辑> * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022156089A1 (en) * | 2021-01-20 | 2022-07-28 | Genecast Biotechnology Co., Ltd | Dna methylation sequencing analysis methods |
CN115910197A (en) * | 2021-12-29 | 2023-04-04 | 上海智峪生物科技有限公司 | Gene sequence processing method, gene sequence processing device, storage medium and electronic equipment |
CN115910197B (en) * | 2021-12-29 | 2024-03-22 | 上海智峪生物科技有限公司 | Gene sequence processing method, device, storage medium and electronic equipment |
WO2023184330A1 (en) * | 2022-03-31 | 2023-10-05 | 京东方科技集团股份有限公司 | Method and apparatus for processing genome methylation sequencing data, device, and medium |
CN115064211A (en) * | 2022-08-15 | 2022-09-16 | 臻和(北京)生物科技有限公司 | ctDNA prediction method based on whole genome methylation sequencing and application thereof |
CN115064211B (en) * | 2022-08-15 | 2023-01-24 | 臻和(北京)生物科技有限公司 | ctDNA prediction method and device based on whole genome methylation sequencing |
Also Published As
Publication number | Publication date |
---|---|
CN112397150B (en) | 2021-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112397150B (en) | ctDNA methylation level prediction device and method based on target region capture sequencing | |
CN112029861B (en) | Tumor mutation load detection device and method based on capture sequencing technology | |
CN112397151B (en) | Methylation marker screening and evaluating method and device based on target capture sequencing | |
CN112735531B (en) | Methylation analysis method and device of circulating cell-free nucleosome active region, terminal equipment and storage medium | |
CN110211633B (en) | Detection method for MGMT gene promoter methylation, processing method for sequencing data and processing device | |
Johnson et al. | Single nucleotide analysis of cytosine methylation by whole‐genome shotgun bisulfite sequencing | |
CN111647648A (en) | Gene panel for detecting breast cancer gene mutation and detection method and application thereof | |
CN115064211B (en) | ctDNA prediction method and device based on whole genome methylation sequencing | |
CN102061337B (en) | Method and system for detecting tissue-specific differentially methylated region (tDMR) | |
CN111755072B (en) | Method and device for simultaneously detecting methylation level, genome variation and insertion fragment | |
CN112941180A (en) | Group of lung cancer DNA methylation molecular markers and application thereof in preparation of lung cancer early diagnosis kit | |
CN105331606A (en) | Nucleic acid molecule quantification method applied to high-throughput sequencing | |
WO2020224159A1 (en) | Next generation sequencing-based panel for detecting glioma, detection kit, detection method, and application thereof | |
CN108595918B (en) | Method and device for processing circulating tumor DNA repetitive sequence | |
CN107893116A (en) | For detecting primer pair combination, kit and the method for building library of gene mutation | |
CN112029842A (en) | Kit and method for ABO blood type genotyping based on high-throughput sequencing | |
CN108319817B (en) | Method and device for processing circulating tumor DNA repetitive sequence | |
CN110106063B (en) | System for detecting 1p/19q combined deletion of glioma based on second-generation sequencing | |
CN111850116A (en) | Gene mutation site group of NK/T cell lymphoma, targeted sequencing kit and application | |
CN110305945A (en) | A kind of free Mitochondrial DNA Mutation detection technique based on two generation sequencing technologies | |
CN114517223A (en) | Method for screening SNP (Single nucleotide polymorphism) sites and application thereof | |
CN110993025B (en) | Method and device for quantifying fetal concentration and method and device for genotyping fetus | |
CN108570496A (en) | A kind of molecular diagnosis method and kit of constitutional bone disease | |
CN109439741B (en) | Gene probe composition for detecting idiopathic epilepsy, kit and application | |
CN115620809B (en) | Nanopore sequencing data analysis method and device, storage medium and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100191 903, 9 / F, healthsmart Valley Building, 35 Huayuan North Road, Haidian District, Beijing Patentee after: Zhenhe (Beijing) Biotechnology Co.,Ltd. Patentee after: Wuxi Zhenhe Biotechnology Co.,Ltd. Address before: 100191 903, 9 / F, healthsmart Valley Building, 35 Huayuan North Road, Haidian District, Beijing Patentee before: Zhenhe (Beijing) Biotechnology Co.,Ltd. Patentee before: Wuxi Zhenhe Biotechnology Co.,Ltd. |