CN112634984B - Method, device and storage medium for simultaneously detecting DNA methylation and genome variation - Google Patents

Method, device and storage medium for simultaneously detecting DNA methylation and genome variation Download PDF

Info

Publication number
CN112634984B
CN112634984B CN202011598472.3A CN202011598472A CN112634984B CN 112634984 B CN112634984 B CN 112634984B CN 202011598472 A CN202011598472 A CN 202011598472A CN 112634984 B CN112634984 B CN 112634984B
Authority
CN
China
Prior art keywords
sample
detected
methylation
dna
mutation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011598472.3A
Other languages
Chinese (zh)
Other versions
CN112634984A (en
Inventor
刘涛
崔添毓
方欢
李敏
管彦芳
杨玲
易鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiyinjia Medical Laboratory Co ltd
Original Assignee
Beijing Jiyinjia Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiyinjia Medical Laboratory Co ltd filed Critical Beijing Jiyinjia Medical Laboratory Co ltd
Priority to CN202011598472.3A priority Critical patent/CN112634984B/en
Publication of CN112634984A publication Critical patent/CN112634984A/en
Application granted granted Critical
Publication of CN112634984B publication Critical patent/CN112634984B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Genetics & Genomics (AREA)
  • Medical Informatics (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed are a method, apparatus and storage medium for simultaneous detection of DNA methylation and genomic variations, the method comprising: and (3) a methylation site repairing step: repairing the methylation sites to be original C bases or G bases according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected; and (3) mutation detection: and taking the repaired sequencing data of the sample to be detected as an input sample of the mutation detection, and carrying out the mutation detection on the sample to be detected to obtain the genome mutation information of the sample to be detected. According to the method for simultaneously detecting the DNA methylation and the genome variation, the DNA library of the sample to be detected is subjected to one-time sequencing, so that the DNA methylation information and the genome variation information of the sample to be detected can be simultaneously obtained, the period of genome variation detection is shortened, the detection of false positive mutation caused by the DNA methylation is avoided, and the accuracy of the genome variation detection is improved.

Description

Method, device and storage medium for simultaneously detecting DNA methylation and genome variation
Technical Field
The application relates to the field of gene mutation detection, in particular to a method, a device and a storage medium for simultaneously detecting DNA methylation and genome variation.
Background
DNA methylation (DNAMethylation) is a form of chemical modification of DNA that can alter genetic behavior without altering the DNA sequence. DNA methylation is one of the most studied epigenetic regulatory mechanisms to date. This modification is a normal and common modification in eukaryotic cells, but gene expression is affected. There are various ways of methylation modification, and the base at the site to be modified may be N-6 position of adenine, N-4 position of cytosine, N-7 position of guanine, and C-5 position of cytosine, which are catalyzed by different DNA methylases, respectively. Although the modification patterns are various, most methylation occurs in and on the transposable region of a gene, and the degree of methylation of CpG islands is relatively low (10%). Studies have shown that hypermethylation of the promoter region leading to inactivation of the tumor suppressor gene is one of the common features of human tumors.
Bisulfite sequencing (hereinafter referred to as BS) is a well-established way to detect DNA methylation. Bisulfite deaminates unmethylated C in DNA to U, while methylated C remains unchanged, and when PCR amplification is performed, all U's are converted to T's. And finally, sequencing the PCR product, comparing the sequencing product with a reference genome, and judging whether the CpG position is subjected to methylation modification. The method is a method with high reliability and precision, and can be used for determining the methylation state of each CpG site in a target fragment.
However, the sequencing method has the obvious defect that bisulfite introduced in the process of library construction can break DNA fragments, so that samples are lost. Therefore, the resource consumed by using BS to detect DNA methylation is more, at least more than 10 clones need to be sequenced to obtain reliable data, and the process is more complicated and expensive. In addition, the data obtained by BS also has obvious base imbalance, and the content of each base in normal sequencing is about: G/C20%; A/T30%, and BS data base content: g20%; c-0; a is 30 percent; t50 percent.
TET-assisted pyridine borane sequencing Technology (TAPS) is used for short, no bisulfite is needed, 5mC and 5hmC are oxidized into 5caC by TET, then 5caC is reduced into Dihydrouracil (DHU) by pyridine borane, and then DHU is converted into thymine by PCR (polymerase chain reaction), and DNA methylation sequencing can be directly carried out on a target sequence, so that the method is a single-base resolution DNA methylation sequencing method with less destructiveness and higher efficiency.
However, in the conventional sequencing technology, if the genomic variation information is detected while the DNA methylation is detected, two samples are required to be respectively subjected to experiment and sequencing analysis, so that the cost is high and the period is long.
How to improve the efficiency of obtaining DNA methylation information and mutation information of a sample is a difficulty in genome variation detection.
Disclosure of Invention
The present application aims to provide a method, a device and a storage medium for simultaneously detecting DNA methylation and genome variation, so as to improve the efficiency of obtaining DNA methylation information and mutation information.
In order to achieve the purpose, the following technical scheme is adopted in the application:
a first aspect of the present application discloses a method for simultaneous detection of DNA methylation and genomic variations, comprising:
and (3) a methylation site repairing step: repairing the methylation sites to be original C bases or G bases according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected;
and (3) mutation detection: and taking the repaired sequencing data of the sample to be detected as an input sample of the mutation detection, and carrying out the mutation detection on the sample to be detected to obtain the genome mutation information of the sample to be detected.
The key point of the present application is that after obtaining a DNA library of a sample to be tested, first, sequencing data of the sample to be tested is obtained by high-depth sequencing, and a methylation site of the sample to be tested is determined according to the high-depth sequencing data, before performing a mutation detection on the sequencing data serving as an input sample, the methylation site is repaired to an original C base or G base according to the sequencing data of the methylation site, and the repaired sequencing data is subjected to a mutation detection on the input sample, thereby obtaining genome variation information of the sample to be tested, such as single base mutation (SNV), base insertion and base deletion (IN/DEL), Copy Number Variation (CNV), and chromosome Structure Variation (SV). Therefore, for the detection of single base mutation, because the methylation sites are repaired in advance, the input of sequencing data carrying the methylation sites into a mutation detection process is avoided, and the detection of false positive mutation caused by DNA methylation is avoided.
By sequencing the DNA library of the sample to be detected once, the DNA methylation information and the genome variation information of the sample to be detected can be obtained simultaneously, the period of genome variation detection is shortened, the detection of false positive mutation caused by DNA methylation is avoided, and the accuracy of genome variation detection is improved.
In one implementation of the present application, the methylation site repair step further comprises a sequencing step before the methylation site repair step, wherein the sequencing step comprises:
extracting DNA of a sample to be detected, carrying out TAPS conversion treatment, and establishing a DNA sequencing library of the sample to be detected by adopting the DNA of the sample to be detected after the TAPS conversion treatment;
and carrying out PCR amplification on the DNA sequencing library of the sample to be detected, and carrying out high-depth sequencing on the amplified product to obtain sequencing data of the sample to be detected.
In an implementation manner of the present application, in the sequencing step, the TAPS conversion treatment specifically includes converting DNA methylation modified cytosine of the sample to be tested into thymine.
It should be noted that, in the present application, the TAPS is adopted to process the DNA extracted from the sample to be detected, and the C in the DNA of the sample to be detected can be converted into T on the basis of keeping the integrity of the DNA of the sample to be detected as much as possible, so that the detection precision and accuracy of the genomic variation are improved.
In one implementation of the present application, the sequencing step is followed by a methylation and mutation differentiation step; and a methylation and mutation distinguishing step, which comprises respectively counting the change modes of the basic groups in the forward template and the reverse template of the DNA of the sample to be detected according to the sequencing data of the sample to be detected obtained in the sequencing step, and distinguishing the methylation state and the mutation according to the change modes of the basic groups in the forward template and the reverse template.
In one implementation of the present application, the variation pattern of the bases in the forward template and the reverse template specifically includes:
first base change pattern: f1, F2, R1 and R2 all have base changes, and the number of the base changes of F1 and F2 is the same but the types are different;
second base change pattern: f1, F2, R1 and R2 all have base changes, and the base changes of F1 and F2 are the same, and the base changes of R1 and R2 are the same;
third base change pattern: f1, F2, R1 and R2 all have base changes, F1 and F2 have inconsistent base changes, and the number of base changes of F1 and the number of base changes of F2 are inconsistent;
fourth base change pattern: only one of F1 and R1 has a base change, and the base change is C to T;
fifth base change pattern: only one of F1 and R1 has a base change, and the base change is C to a;
wherein F1 is a forward template in a DNA sequencing library of a sample to be detected, R1 is a reverse template in the DNA sequencing library of the sample to be detected, F2 is a forward template complementary to R1, and R2 is a reverse template complementary to F1.
The second aspect of the present application also discloses an apparatus for simultaneously detecting DNA methylation and genomic variations, the apparatus comprising:
methylation site repair module: repairing the methylation sites to be original C bases or G bases according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected;
a variation detection module: and taking the repaired sequencing data of the sample to be detected as an input sample of the mutation detection, and carrying out the mutation detection on the sample to be detected to obtain the genome mutation information of the sample to be detected.
In one implementation of the present application, the apparatus further comprises a sequencing module, the sequencing module comprising:
extracting DNA of a sample to be detected, carrying out TAPS conversion treatment, and establishing a DNA sequencing library of the sample to be detected by adopting the DNA of the sample to be detected after the TAPS conversion treatment;
and performing high-depth sequencing on the DNA sequencing library of the sample to be tested to obtain sequencing data of the sample to be tested.
In one implementation of the present application, the device further comprises a methylation and mutation differentiation module;
and the methylation and mutation distinguishing module is used for respectively counting the change modes of the basic groups in the forward template and the reverse template of the DNA of the sample to be detected according to the sequencing data of the sample to be detected, which is obtained by the sequencing module, and distinguishing the methylation state and the mutation according to the change modes of the basic groups in the forward template and the reverse template.
A third aspect of the present application discloses an apparatus for simultaneous detection of DNA methylation and genomic variations, comprising:
a memory for storing a program;
a processor for implementing the above method for simultaneous detection of DNA methylation and genomic variations by executing a program stored in a memory.
A fourth aspect of the present application discloses a computer-readable storage medium having stored thereon a program executable by a processor to implement the above-described method for simultaneous detection of DNA methylation and genomic variations.
Due to the adoption of the technical scheme, the beneficial effects of the application are as follows:
according to the method for simultaneously detecting the DNA methylation and the genome variation, the DNA library of the sample to be detected is subjected to one-time sequencing, so that the DNA methylation information and the genome variation information of the sample to be detected can be simultaneously obtained, the period of genome variation detection is shortened, the detection of false positive mutation caused by the DNA methylation is avoided, and the accuracy of the genome variation detection is improved.
Drawings
FIG. 1 is a block diagram of an apparatus for simultaneously detecting DNA methylation and genomic variation according to this embodiment.
Detailed Description
The present application will be described in further detail with reference to specific embodiments. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification are for the purpose of clearly describing one embodiment only and are not meant to be necessarily order unless otherwise indicated where a certain order must be followed.
The embodiment discloses a method for simultaneously detecting DNA methylation and genome variation, which comprises the following steps:
and (3) a methylation site repairing step: repairing the methylation sites to be original C bases or G bases according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected;
and (3) mutation detection: and taking the repaired sequencing data of the sample to be detected as an input sample of the mutation detection, and carrying out the mutation detection on the sample to be detected to obtain the genome mutation information of the sample to be detected.
Specifically, after obtaining a DNA library of a sample to be tested, firstly, obtaining sequencing data of the sample to be tested by high-depth sequencing, and determining methylation sites and methylation rates of the sample to be tested according to the high-depth sequencing data, before performing mutation detection on the sequencing data serving as an input sample, repairing the methylation sites to original C bases or G bases according to the sequencing data of the methylation sites, and performing mutation detection on the repaired sequencing data serving as an input sample, thereby obtaining genome variation information of the sample to be tested, such as single base mutation (SNV), base insertion and base deletion (IN/DEL), Copy Number Variation (CNV) and chromosome Structure Variation (SV). Therefore, for the detection of single base mutation, because the methylation sites are repaired in advance, the input of sequencing data carrying the methylation sites into a mutation detection process is avoided, and the detection of false positive mutation caused by DNA methylation is avoided.
By sequencing the DNA library of the sample to be detected once, the DNA methylation information and the genome variation information of the sample to be detected can be obtained simultaneously, the period of genome variation detection is shortened, the detection of false positive mutation caused by DNA methylation is avoided, and the accuracy of genome variation detection is improved.
In one implementation manner of this embodiment, the methylation site repair step further comprises a sequencing step before, wherein the sequencing step comprises:
extracting DNA of a sample to be detected, carrying out TAPS conversion treatment, and establishing a DNA sequencing library of the sample to be detected by adopting the DNA of the sample to be detected after the TAPS conversion treatment;
and performing high-depth sequencing on the DNA sequencing library of the sample to be tested to obtain sequencing data of the sample to be tested.
Specifically, the DNA of the sample to be tested is human somatic DNA, including but not limited to human fresh tissue-derived DNA, paraffin-embedded tissue-derived DNA, plasma-derived DNA, pleural effusion-derived DNA, and ascites-derived DNA. The sample to be tested is subjected to DNA extraction and purification for subsequent TAPS conversion treatment, and the purification can be performed in a manner known to those skilled in the art, for example, free DNA in the blood sample to be tested can be extracted by using a magnetic bead method or a centrifugal column method commonly used in the art to remove impurities and purify the DNA, which is not limited in particular. And adding a linker to the DNA subjected to the TAPS conversion treatment to construct a DNA library of the sample to be detected.
Further, the DNA sequencing library of the sample to be tested is amplified, the amplification mode may be a mode known to those skilled in the art, for example, the DNA library may be amplified by PCR, different primers and probes may be designed according to the amplification requirement, the whole genome sequence may be amplified indiscriminately, the gene sequence of the exon region may be amplified specifically, specific pathogenic genes may be amplified specifically, so as to perform high-depth sequencing on the amplification product, and finally obtain DNA sequencing data (reads) with the same length using the text file as the vector. In a specific implementation manner of this embodiment, the length of the sequencing data may be 100-200 bp. The term "high-depth sequencing" refers to that a certain region of a chromosome is detected many times, for example, several thousands to several tens of thousands of times, and the ratio of methylation detected in the region and the ratio of methylation not detected in the region, that is, the methylation rate of the region, can be obtained by multiple overlay detections.
In an implementation manner of this embodiment, in the sequencing step, the TAPS conversion treatment specifically includes converting DNA methylation modified cytosine in the sample to be tested into thymine.
Specifically, TAPS is adopted to treat DNA extracted from a sample to be detected, TET is utilized to oxidize 5mC and 5hmC into 5caC, then pyridine borane is utilized to reduce 5caC into Dihydrouracil (DHU), and then PCR is utilized to convert DHU into thymine, so that C in the sample DNA to be detected can be converted into T on the basis of keeping the integrity of the sample DNA to be detected as much as possible, and the detection precision and accuracy of genome variation are improved.
In one implementation of this embodiment, the sequencing step is followed by a methylation and mutation differentiation step; and a methylation and mutation distinguishing step, which comprises respectively counting the change modes of the basic groups in the forward template and the reverse template of the DNA of the sample to be detected according to the sequencing data of the sample to be detected, and distinguishing the methylation state and the mutation according to the change modes of the basic groups in the forward template and the reverse template.
Specifically, a DNA forward template sequence in sequencing data is compared with a reference genome, a complementary sequence of a DNA reverse template in the sequencing data is compared with the reference genome, wherein the reference genome is a human reference genome, the change modes of bases in the DNA forward template and the DNA reverse template of a sample to be detected are counted according to the comparison result, and the methylation state and mutation are distinguished according to the change modes of the bases in the forward template and the reverse template. After the methylation sites are determined according to the methylation states of the forward template and the reverse template, the sequencing data of the sample to be detected can be repaired, and the methylation sites are repaired to be original C bases or G bases for subsequent mutation detection.
In one implementation manner of this embodiment, the variation pattern of the bases in the forward template and the reverse template specifically includes:
first base change pattern: f1, F2, R1 and R2 all have base changes, and the number of the base changes of F1 and F2 is the same but the types are different;
second base change pattern: f1, F2, R1 and R2 all have base changes, and the base changes of F1 and F2 are the same, and the base changes of R1 and R2 are the same;
third base change pattern: f1, F2, R1 and R2 all have base changes, F1 and F2 have inconsistent base changes, and the number of base changes of F1 and the number of base changes of F2 are inconsistent;
fourth base change pattern: only one of F1 and R1 has a base change, and the base change is C to T;
fifth base change pattern: only one of F1 and R1 has a base change, and the base change is C to a;
wherein F1 is a forward template in a DNA sequencing library of a sample to be detected, R1 is a reverse template in the DNA sequencing library of the sample to be detected, F2 is a forward template complementary to R1, and R2 is a reverse template complementary to F1.
Specifically, in the sequencing process, the sequencing data (reads) include sequencing data of a forward template of a DNA double strand, denoted by F, and sequencing data of a reverse template, denoted by R. F1, F2, R1 and R2 are sequence forms observed after sequencing, F1 and F2 are used for describing a forward template in a DNA double strand, R1 and R2 are used for describing a reverse template in the DNA double strand, F1 and R1 are derived from the forward template and the reverse template of a DNA library of a sample to be tested, and F2 and R2 are derived from the forward template and the reverse template formed by amplifying the F1 and R1.
The sequencing data comprises UMI sequences, reads with the same UMI sequences and the same alignment positions are placed in the same cluster, and the reads in the same cluster are considered to be from the same double-stranded template. Therefore, F1 and R1 are in the same file after sequencing, F2 and R2 are in the other file to distinguish F1 from F2 or distinguish R1 from R2, and then methylation sites and mutation sites can be distinguished according to base change information of F1, F2, R1 and R2.
Specifically, in the case of double-strand methylation, C in both F1 and R1 sequences in a DNA library is converted into T by TAPS, so that base changes exist in F1 and F2 in sequencing data, and F1 and F2 have the same number of base changes but different types; for the case of only mutation, if there are C mutation to T in F1 and G mutation to a in R1 in the DNA library, then there are C mutation to T in F2, G mutation to a in R2, base changes in F1, F2, R1, R2, and the same base changes in F1 and F2, R1 and R2 in the sequencing data; for the case of simultaneous mutation and methylation, for example, F1 single-stranded methylation and R1C to T mutation in a DNA library, then F1 has C to T and G to A, F2 has G to A, R1 has C to T, R2 has G to A and C to T in sequencing data, that is, there are base changes in F1, F2, R1 and R2, F1 and F2 base changes are not consistent, and the number of base changes in F1 and the number of base changes in F2 are not consistent; for the case of single-stranded methylation, there is only one base change in F1 and R1 in the sequencing data, and the base change is C to T; in the case of oxidative damage, there was a base change in only one of F1 and R1 in the sequencing data, and the base change was C to a.
Then, based on the base change pattern, it can be judged that: if the change modes of the bases in the DNA forward template and the reverse template of the sample to be detected belong to a first base change mode, methylation modification exists at the sites of F1 and R1 base change in the sequencing data of the sample to be detected;
if the change modes of the bases in the DNA forward template and the reverse template of the sample to be detected belong to a second base change mode, methylation does not exist in the sequencing data of the sample to be detected, and mutations exist in the site of F1 base change and the site of R1 base change;
if the change modes of the bases in the DNA forward template and the reverse template of the sample to be detected belong to a third base change mode, the sequences with the large base change number in F1 and F2 in the sequencing data of the sample to be detected have single-stranded methylation sites and mutation sites, and the sequences with the small base change number only have mutation sites;
if the change modes of the bases in the DNA forward template and the reverse template of the sample to be detected belong to the fourth base change mode, single-stranded methylation modification exists at the base change site in F1 or R1 in the sequencing data of the sample to be detected;
and if the change modes of the bases in the DNA forward template and the reverse template of the sample to be detected belong to a fifth base change mode, the sites of the base change in F1 or R1 in the sequencing data of the sample to be detected have oxidative damage.
Those skilled in the art will appreciate that all or part of the functions of the above-described method embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
Thus, in an embodiment of the present application, as shown in fig. 1, the second aspect of the present application further discloses an apparatus for simultaneously detecting DNA methylation and genomic variation, the apparatus comprising: a methylation site repair module 201 and a mutation detection module 202.
The methylation site repairing module 201 is used for repairing the methylation site to be the original C base or G base according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected; and the mutation detection module 202 is configured to use the repaired sequencing data of the sample to be detected as an input sample for mutation detection, and perform mutation detection on the sample to be detected to obtain genomic mutation information of the sample to be detected.
Another embodiment of the present application further provides an apparatus for simultaneously detecting DNA methylation and genomic variations, comprising: a memory for storing a program; a processor for implementing the following method by executing the program stored in the memory: and (3) a methylation site repairing step: repairing the methylation sites to be original C bases or G bases according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected; and (3) mutation detection: and taking the repaired sequencing data of the sample to be detected as an input sample of the mutation detection, and carrying out the mutation detection on the sample to be detected to obtain the genome mutation information of the sample to be detected.
In one implementation manner of this embodiment, the apparatus further includes a sequencing module, and the sequencing module includes:
extracting DNA of a sample to be detected, carrying out TAPS conversion treatment, and establishing a DNA sequencing library of the sample to be detected by adopting the DNA of the sample to be detected after the TAPS conversion treatment;
and performing high-depth sequencing on the DNA sequencing library of the sample to be tested to obtain sequencing data of the sample to be tested.
In one implementation of this embodiment, the apparatus further comprises a methylation and mutation differentiation module;
and the methylation and mutation distinguishing module is used for respectively counting the change modes of the basic groups in the forward template and the reverse template of the DNA of the sample to be detected according to the sequencing data of the sample to be detected, which is obtained by the sequencing module, and distinguishing the methylation state and the mutation according to the change modes of the basic groups in the forward template and the reverse template.
Another embodiment of the present application also provides a computer-readable storage medium having a program stored thereon, the program being executable by a processor to implement a method of: and (3) a methylation site repairing step: repairing the methylation sites to be original C bases or G bases according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected; and (3) mutation detection: and taking the repaired sequencing data of the sample to be detected as an input sample of the mutation detection, and carrying out the mutation detection on the sample to be detected to obtain the genome mutation information of the sample to be detected.
The present application is described in further detail below with reference to specific embodiments and the attached drawings. The following examples are intended to be illustrative of the present application only and should not be construed as limiting the present application.
Example 1
Extracting DNA of a plasma sample, performing TAPS conversion treatment, and establishing a DNA library of the sample to be detected by using the DNA of the sample to be detected after the TAPS conversion treatment; performing high-depth sequencing on a DNA library of a sample to be tested to obtain sequencing data of the sample to be tested; repairing the methylation sites to be original C bases or G bases according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected; and taking the repaired sequencing data of the sample to be detected as an input sample of the mutation detection, and carrying out the mutation detection on the sample to be detected to obtain the genome mutation information of the sample to be detected.
Specifically, if C of F1 is changed into T and G of F2 is changed into A in sequencing data of the sample to be detected, methylation modification exists at a site of F1 base change and a site complementary to the site of F2 base change in the sample to be detected; if the F1 and the F2 of the sample to be detected have C changed into T, the sample to be detected has no methylation, and the site of F1 base change of the sample to be detected and the site complementary to the site of F2 base change have mutation; if the F1 has C changed into T and G changed into A in the sequencing data of the sample to be detected and the F2 has G changed into A, methylation modification exists at the site of F1 base change, and mutation exists at the site which is complementary to the site of F2 base change; if only C in the F1 of the sample to be detected is changed into T, and G in the R2 is changed into A, methylation modification exists at the site of the F1 basic group change of the sample to be detected; if only C of the F1 in the sample to be detected is changed into A, and G of the R2 in the sample to be detected is changed into T, the site of the F1 basic group change in the sample to be detected has oxidation damage.
In the embodiment, methylation information including methylation sites and methylation rates is determined through different base change information of F1, F2, R1 and R2 in sequencing data, so that the methylation sites in the sequencing data can be repaired in advance, detection of false positive mutation caused by the methylation sites is avoided in subsequent mutation detection, the accuracy and the precision of genome mutation detection are improved, and multiple groups of mathematical gene variation information of a sample to be detected are obtained under the condition that a more complete original sample is reserved.
The present application has been described with reference to specific examples, which are provided only to aid understanding of the present application and are not intended to limit the present application. For a person skilled in the art to which the application pertains, several simple deductions, modifications or substitutions may be made according to the idea of the application.

Claims (7)

1. A method for simultaneously detecting DNA methylation and genomic variation, comprising:
methylation and mutation differentiation steps: respectively counting the change modes of bases in a forward template and a reverse template of the DNA of the sample to be detected according to the sequencing data of the sample to be detected, and distinguishing the methylation state and mutation according to the change modes of the bases in the forward template and the reverse template to obtain the DNA methylation information of the sample to be detected;
and (3) a methylation site repairing step: repairing the methylation sites to be original C bases or G bases according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected;
and (3) mutation detection: taking the repaired sequencing data of the sample to be detected as an input sample of variation detection, and carrying out mutation detection on the sample to be detected to obtain genome variation information of the sample to be detected;
wherein, the change pattern of the bases in the forward template and the reverse template specifically comprises:
double-stranded methylation pattern: f1, F2, R1 and R2 all have base changes, and the number of the base changes of F1 and F2 is the same but the types are different;
only the mutation pattern occurred: f1, F2, R1 and R2 all have base changes, and the base changes of F1 and F2 are the same, and the base changes of R1 and R2 are the same;
there are methylation and mutation patterns together: f1, F2, R1 and R2 all have base changes, F1 and F2 have inconsistent base changes, and the number of base changes of F1 and the number of base changes of F2 are inconsistent;
single-chain methylation pattern: only one of F1 and R1 has a base change, and the base change is C to T;
oxidative damage mutation pattern: only one of F1 and R1 has a base change, and the base change is C to a;
wherein F1 is a forward template in a DNA sequencing library of a sample to be detected, R1 is a reverse template in the DNA sequencing library of the sample to be detected, F2 is a forward template complementary to R1, and R2 is a reverse template complementary to F1.
2. The method for simultaneous detection of DNA methylation and genomic variation according to claim 1, wherein the methylation site repair step is preceded by a sequencing step comprising:
extracting DNA of a sample to be detected, carrying out TAPS conversion treatment, and establishing a DNA sequencing library of the sample to be detected by adopting the DNA of the sample to be detected after the TAPS conversion treatment;
and performing high-depth sequencing on the DNA sequencing library of the sample to be tested to obtain sequencing data of the sample to be tested.
3. The method as claimed in claim 2, wherein the TAPS transformation step comprises transforming DNA methylation-modified cytosine of the sample to be tested into thymine.
4. An apparatus for simultaneously detecting DNA methylation and genomic variations, the apparatus comprising:
methylation and mutation discrimination module: the method is used for respectively counting the change modes of bases in a forward template and a reverse template of the DNA of a sample to be detected according to the sequencing data of the sample to be detected, and distinguishing the methylation state and mutation according to the change modes of the bases in the forward template and the reverse template to obtain the DNA methylation information of the sample to be detected;
methylation site repair module: repairing the methylation sites to be original C bases or G bases according to DNA methylation information of the sample to be detected, and obtaining repaired sequencing data of the sample to be detected;
a variation detection module: taking the repaired sequencing data of the sample to be detected as an input sample of variation detection, and carrying out mutation detection on the sample to be detected to obtain genome variation information of the sample to be detected;
wherein, the change pattern of the bases in the forward template and the reverse template specifically comprises:
double-stranded methylation pattern: f1, F2, R1 and R2 all have base changes, and the number of the base changes of F1 and F2 is the same but the types are different;
only the mutation pattern occurred: f1, F2, R1 and R2 all have base changes, and the base changes of F1 and F2 are the same, and the base changes of R1 and R2 are the same;
there are methylation and mutation patterns together: f1, F2, R1 and R2 all have base changes, F1 and F2 have inconsistent base changes, and the number of base changes of F1 and the number of base changes of F2 are inconsistent;
single-chain methylation pattern: only one of F1 and R1 has a base change, and the base change is C to T;
oxidative damage mutation pattern: only one of F1 and R1 has a base change, and the base change is C to a;
wherein F1 is a forward template in a DNA sequencing library of a sample to be detected, R1 is a reverse template in the DNA sequencing library of the sample to be detected, F2 is a forward template complementary to R1, and R2 is a reverse template complementary to F1.
5. The apparatus for simultaneous detection of DNA methylation and genomic variation according to claim 4, further comprising a sequencing module, the sequencing module comprising:
extracting DNA of a sample to be detected, carrying out TAPS conversion treatment, and establishing a DNA sequencing library of the sample to be detected by adopting the DNA of the sample to be detected after the TAPS conversion treatment;
and performing high-depth sequencing on the DNA sequencing library of the sample to be tested to obtain sequencing data of the sample to be tested.
6. An apparatus for simultaneously detecting DNA methylation and genomic variations, comprising:
a memory for storing a program;
a processor for implementing the method of any one of claims 1-3 by executing a program stored by the memory.
7. A computer-readable storage medium, characterized in that the medium has stored thereon a program which is executable by a processor to implement the method according to any one of claims 1-3.
CN202011598472.3A 2020-12-29 2020-12-29 Method, device and storage medium for simultaneously detecting DNA methylation and genome variation Active CN112634984B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011598472.3A CN112634984B (en) 2020-12-29 2020-12-29 Method, device and storage medium for simultaneously detecting DNA methylation and genome variation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011598472.3A CN112634984B (en) 2020-12-29 2020-12-29 Method, device and storage medium for simultaneously detecting DNA methylation and genome variation

Publications (2)

Publication Number Publication Date
CN112634984A CN112634984A (en) 2021-04-09
CN112634984B true CN112634984B (en) 2021-09-28

Family

ID=75287500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011598472.3A Active CN112634984B (en) 2020-12-29 2020-12-29 Method, device and storage medium for simultaneously detecting DNA methylation and genome variation

Country Status (1)

Country Link
CN (1) CN112634984B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113674802B (en) * 2021-08-20 2022-09-09 深圳吉因加医学检验实验室 Method and device for performing variation detection based on methylation sequencing data
CN115410649B (en) * 2022-04-01 2023-03-28 北京吉因加医学检验实验室有限公司 Method and device for simultaneously detecting methylation and mutation information

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111755072A (en) * 2020-08-04 2020-10-09 深圳吉因加医学检验实验室 Method and device for simultaneously detecting methylation level, genome variation and insertion fragment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130157266A1 (en) * 2009-03-15 2013-06-20 Ribomed Biotechnologies, Inc. Abscription based molecular detection of dna methylation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111755072A (en) * 2020-08-04 2020-10-09 深圳吉因加医学检验实验室 Method and device for simultaneously detecting methylation level, genome variation and insertion fragment

Also Published As

Publication number Publication date
CN112634984A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
US12006532B2 (en) Methods for targeted nucleic acid sequence enrichment with applications to error corrected nucleic acid sequencing
JP7369492B2 (en) Determination of base modifications of nucleic acids
KR102028375B1 (en) Systems and methods to detect rare mutations and copy number variation
CN113661249A (en) Compositions and methods for isolating cell-free DNA
EP3325667B1 (en) Locked nucleic acids for capturing fusion genes
CN112634984B (en) Method, device and storage medium for simultaneously detecting DNA methylation and genome variation
US11608518B2 (en) Methods for analyzing nucleic acids
JP2023526252A (en) Detection of homologous recombination repair defects
JP2024056984A (en) Methods, compositions and systems for calibrating epigenetic compartment assays
CN111575349A (en) Linker sequence and application thereof
CN117441027A (en) Headrich-BS: thermal enrichment of CpG-rich regions for bisulfite sequencing
CN114746560A (en) Methods, compositions, and systems for improved binding of methylated polynucleotides
WO2018219581A1 (en) Method and system for nucleic acid sequencing
CN116568822A (en) Method and system for improving signal-to-noise ratio of DNA methylation partition assays

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant