WO2020252749A1 - 基于dna样本构建测序文库的方法及应用 - Google Patents

基于dna样本构建测序文库的方法及应用 Download PDF

Info

Publication number
WO2020252749A1
WO2020252749A1 PCT/CN2019/092116 CN2019092116W WO2020252749A1 WO 2020252749 A1 WO2020252749 A1 WO 2020252749A1 CN 2019092116 W CN2019092116 W CN 2019092116W WO 2020252749 A1 WO2020252749 A1 WO 2020252749A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
sequencing
methylation
dna sample
sequence
Prior art date
Application number
PCT/CN2019/092116
Other languages
English (en)
French (fr)
Inventor
杨林
王其伟
杨心石
于源
杨娟
张艳艳
陈芳
蒋慧
Original Assignee
深圳华大智造科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大智造科技有限公司 filed Critical 深圳华大智造科技有限公司
Priority to CN201980092842.5A priority Critical patent/CN113544282B/zh
Priority to EP19933969.8A priority patent/EP3988665B1/en
Priority to PCT/CN2019/092116 priority patent/WO2020252749A1/zh
Publication of WO2020252749A1 publication Critical patent/WO2020252749A1/zh
Priority to US17/545,724 priority patent/US20220090059A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1068Template (nucleic acid) mediated chemical library synthesis, e.g. chemical and enzymatical DNA-templated organic molecule synthesis, libraries prepared by non ribosomal polypeptide synthesis [NRPS], DNA/RNA-polymerase mediated polypeptide synthesis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • the invention relates to the field of gene sequencing, in particular to a method and application for constructing a sequencing library based on DNA samples.
  • DNA methylation is an apparent regulatory modification, which participates in the regulation of protein synthesis without changing the base sequence.
  • DNA methylation is a very wonderful chemical modification. The care of relatives, the body's aging, smoking, alcoholism and even obesity will be truthfully recorded on the genome by methylation. The genome is like a diary, and methylation is used as text to record the experience of the human body.
  • DNA methylation is an important epigenetic marker information. Obtaining the methylation level data of all C sites in the whole genome is of great significance for the study of epigenetic spatio-temporal specificity.
  • mapping the DNA methylation level of the whole genome, and analyzing the high-precision methylation modification patterns of specific species will surely have a milestone significance in epigenomics research.
  • an object of the present invention is to provide a method for constructing a sequencing library based on DNA samples.
  • a sequencing library can be constructed using methylated DNA samples, and the obtained sequencing library can satisfy whole genome methylation sequencing or The need for methylation sequencing of specific regions.
  • Whole Genome Bisulfite Sequencing (Whole Genome Bisulfite Sequencing, WGBS), that is, whole genome bisulfite sequencing, as one of the most commonly used methods to study biological methylation, can cover all methylation sites and obtain a comprehensive Methylation profile. However, it still encounters many challenges in high-throughput sequencing.
  • the main manifestations are: First, unmethylated C bases after bisulfite treatment will be converted into U bases, and the GC content of the entire genome will be extreme. Changes, resulting in great preference for subsequent amplification; second, the data after bisulfite treatment usually faces greater difficulties in analysis, such as the majority of cytosine (C) in the genome after bisulfite treatment.
  • WGBS is a better method for DNA methylation research, but considering its own design flaws, detection preferences, and problems encountered in bioinformatics analysis, its application is greatly hindered.
  • the inventors of the present invention found in the process of research that in the process of building and sequencing DNA methylation samples, an improved whole-genome methylation sequencing method can be used to improve the preference for high CG and increase the ratio.
  • the effectiveness of the pair can ensure accurate detection of DNA methylation information.
  • endonuclease and polymerase introduce methylation-modified cytosine into the DNA template strand to prepare a mixed DNA strand containing the original template and the newly generated template.
  • the original template in the mixed DNA chain carries the methylation modification information of the cytosine in the original DNA; all the cytosines in the newly generated template in the mixed DNA chain are all newly generated cytosines with methylation modification, which can be
  • the original information of DNA is preserved under sulfite treatment. Under the sulfite treatment, the unmethylated cytosine (C) in the original template is converted to uracil (U). The cytosine in the newly generated template is completely methylated, so it is treated with bisulfite A part of the DNA retains DNA methylation information, and the other part retains the original DNA information to form a mixed DNA fragment with reserved DNA methylation information & reserved DNA information. Based on these fragments, a sequencing library can be constructed for whole-genome methylation sequencing.
  • sequence capture technology is used to selectively enrich specific regions of the genome, and the region of interest can be separated from the genome by appropriate methods. , And then sequence the target area, you can conduct targeted genomics research, and can reduce costs.
  • probe capture technology various companies have developed capture products for target area methylation, such as agilent and roche. Agilent first captures the target area that they want to study, and then treats the sample with sulfite.
  • the disadvantage of post-construction library is that the samples cannot be enriched before capture, which is a great challenge for samples with low starting volume; Roche adopts a strategy of first sulfite treatment and then enrichment, and then design probes For capturing, since the designed probe is for the DNA after sulfite treatment, it is necessary to traverse the design of the methylated or unmethylated state of cytosine, the probe design is expensive, and the design of too many variable probes Needle, the specificity of the probe's capture is also greatly reduced.
  • the present invention provides the following technical solutions:
  • the present invention provides a method for constructing a sequencing library based on a DNA sample, comprising: cutting the DNA sample with endonuclease to obtain a DNA sample with a single-strand nick;
  • the DNA sample with single-stranded nicks is polymerized using polymerase, dATP, dTTP, dGTP, and methylation-modified (5mC) dCTP to obtain mixed DNA, the mixed DNA including two reverse-complementary Chain, wherein the 5'end of each chain is the original sequence of the DNA sample, the 3'end of each chain is a synthetic sequence, and the base C on the 3'end of each chain is modified by methylation
  • performing bisulfite treatment to obtain the converted mixed DNA; based on the converted mixed DNA, amplifying to obtain a sequencing library.
  • the present invention introduces methylation-modified cytosine into the DNA template chain by endonuclease and polymerase to prepare a mixed DNA chain containing the original template and the newly generated template.
  • the original template in the mixed DNA chain carries the methylation modification information of the cytosine in the original DNA; all the cytosines in the newly generated template in the mixed DNA chain are all newly generated cytosines with methylation modification, which can be
  • the original information of DNA is preserved under sulfite treatment. Under the treatment of sulfite, the unmethylated cytosine (C) in the original template is converted to uracil (U), and the cytosine in the newly generated template undergoes bisulfite because it is fully methylated and modified.
  • One part of the processed DNA strand retains DNA methylation information, and the other part retains original DNA information, forming a mixed DNA fragment with 5'end retaining DNA methylation information & 3'end retaining original DNA information. Based on these fragments, a sequencing library can be constructed. Used for whole genome methylation sequencing or multiplex PCR targeted sequencing and probe capture sequencing.
  • the base information after methylation is the base information after methylation, and half retains the original DNA base information, which balances the extreme preference of sulfite for template processing.
  • the subsequent PCR process It can effectively improve the amplification preference of methylated libraries on CpG islands; and it is equivalent to completing the preparation of WGBS and WGS libraries in one library construction.
  • the retained DNA information can accurately locate the position information on the genome and increase the accuracy of methylation comparison; and the operation steps are simplified, and the process of library interruption, end repair and A addition can be completed in one step.
  • multiple PCR capture technology can be developed based on the mixed-strand library.
  • One PCR primer of this capture technology is designed on the DNA sequence that retains the methylation information, and the other is designed on the DNA sequence that retains the original DNA sequence information, thus avoiding conventional Designing methylated primers for transformed DNA has the phenomenon of primer dimers, and it has higher specificity than conventional methylated primers.
  • probe capture technology can be developed. The probe design is based on the sequence that retains the original DNA sequence information. Compared with the design for the transformed DNA sequence, the difficulty of probe design is greatly reduced.
  • the method for constructing a sequencing library based on DNA samples may further include the following technical features:
  • the endonuclease is at least one of DNaseI or DNaseII, or any endonuclease capable of producing the single-stranded nick.
  • the polymerase is BST polymerase, phi29 polymerase, klenow polymerase, or any polymerase that can realize the function of DNA polymerization.
  • the length of the DNA sample with a single-stranded nick is 100-1000 bp.
  • the method further includes: connecting a methylated sequencing adapter based on the mixed DNA, and performing bisulfate, bisulfite treatment or other processing methods that can transform methylation information
  • the methylated sequencing adapter contains the first universal and the second universal sequence
  • the universal primers are used for amplification to obtain a sequencing library, the The universal primer matches the first universal sequence and the second universal sequence.
  • the 5'end of the transformed mixed DNA strand is the DNA sequence after treatment, all unmethylated cytosines are converted to U bases; the 3'end is the newly synthesized DNA sequence, and all cytosines are converted into U bases. Base modification, keep the original DNA sequence information unchanged under the conversion process. By constructing a sequencing library in this way, whole-genome methylation sequencing can be achieved.
  • the methylated sequencing adapter is any one of MGI, Illumina, Proton or other sequencing platforms.
  • the DNA sample is a whole genome DNA sample.
  • the method further includes:
  • the mixed DNA is directly subjected to bisulfate, bisulfite treatment or other processing methods that can convert methylation information without ligation, so as to obtain the converted mixed DNA, where the 5'of the converted mixed DNA strand
  • the end is the transformed DNA sequence, all unmethylated cytosines are converted into U bases; the 3'end is the newly synthesized DNA sequence, all cytosines are modified by methylation, and the original The DNA sequence information remains unchanged.
  • specific primers are used to perform the amplification so as to obtain a sequencing library based on the target region of the DNA sample.
  • the specific primers include a first specific primer and a second specific primer.
  • a primer, the first specific primer is consistent with the 5'end sequence of the transformed mixed DNA, and the second specific primer is complementary to the 3'end sequence of the transformed mixed DNA.
  • one of the specific primers is for retaining methylation information DNA sequence design
  • another specific primer is designed for the original DNA sequence
  • one primer is rich in ATG
  • one primer contains ATCG, which can reduce the primer dimer problem encountered in the process of methylation multiplex PCR.
  • the method further includes:
  • a probe is used for hybridization capture and elution to obtain a hybrid product.
  • the probe is used to capture the 3'end sequence of the transformed mixed DNA, that is, in the bisulfite After processing, the template strand whose DNA sequence information remains unchanged; based on the hybridization product, a sequencing library is obtained by amplification.
  • the present invention provides a method for sequencing a DNA sample, including:
  • a sequencing library is obtained by using the method described in any one of the embodiments of the first aspect of the present invention.
  • sequencing Based on the sequencing library, sequencing to obtain the sequencing result of the DNA sample.
  • the sequencing is pair-end sequencing or single-end sequencing.
  • the present invention provides a method for determining the methylation status of a DNA sample, including:
  • sequencing to obtain a sequencing result of the DNA sample
  • the position of the DNA sample is compared and analyzed to determine the methylation status of the DNA sample.
  • the method for determining the methylation status of a DNA sample described above may further include the following technical features:
  • the method for determining the methylation status of a DNA sample further includes:
  • the 5'end corresponds to one candidate position, and the vicinity of the 5'end corresponding to the candidate position belongs to one of the multiple candidate positions of the 3'end, then determine all The corresponding position of the 5'end shall prevail;
  • the 5'end corresponds to multiple candidate positions, and the vicinity of the 3'end corresponding to the candidate position belongs to one of the multiple candidate positions of the 5'end, then it is determined
  • the corresponding positions of the 3 ends mentioned above shall prevail.
  • the 3'end corresponds to a candidate position
  • the 5'end corresponds to a candidate position
  • the vicinity of the candidate position corresponding to the 3'end belongs to the vicinity of the candidate position of the 5'end
  • the corresponding position of the end or 5'end shall prevail.
  • Other cases are attributed to multiple alignments, and the alignment position of reads cannot be accurately known, but the three-terminal alignment position can be used as the main alignment position.
  • BWA software is used to compare the 3'end with the reference genome
  • BS-map software is used to compare the 5'end with the reference genome.
  • the present invention provides a kit comprising: an endonuclease, a nucleic acid amplification reagent, a methylation modified dCTP and a methylation detection reagent.
  • the kit further includes a first specific primer and a second specific primer
  • the first specific primer includes SEQ ID NO: 7 to SEQ ID NO: 16
  • the first specific primer includes SEQ ID NO: 17 to SEQ ID NO: 26.
  • the kit further includes a probe for capturing the target sequence and constructing the target region nucleic acid library.
  • the present invention provides a double-stranded DNA comprising two reverse-complementary strands, wherein each strand includes a 5'end sequence and a 3'end sequence, and each strand The base C on the 3'end of is all modified by methylation.
  • the 5'-end sequence of the provided double-stranded DNA is a sequence that retains methylation information. It can be a sequence in which all unmethylated cytosines are converted to U bases after sulfite treatment, or other enzymes.
  • the C bases of the sequence at the 3 end are all modified by methylation, and the cytosine information remains unchanged during the conversion process.
  • the length of the double-stranded DNA is 100-1000 bp.
  • Fig. 1 is a flowchart of DNA methylation hybrid library construction according to an embodiment of the present invention.
  • Fig. 2 is a flowchart of DNA methylation hybrid multiplex PCR according to an embodiment of the present invention.
  • Fig. 3 is a diagram of quality inspection results of a methylated DNA mixed library provided according to an embodiment of the present invention.
  • FIG. 4 is a graph of comparison ratio results of different methods provided according to an embodiment of the present invention.
  • Figure 5 is a graph showing the coverage results of CpG sites on regions with different GC content by different methods according to embodiments of the present invention.
  • Fig. 6 is the coverage results of different methods on whole genes according to the embodiments of the present invention.
  • Fig. 7 is a result diagram of the sequencing depth of each amplicon provided according to an embodiment of the present invention.
  • Fig. 8 is a flowchart of DNA methylation hybrid library capture according to an embodiment of the present invention.
  • Fig. 9 is a graph of comparison results of methylation rates of target sites according to an embodiment of the present invention.
  • the base N or n when representing a base, can be any base A, T, C or G.
  • the present invention provides a method for constructing a sequencing library based on a DNA sample, which includes: (1) Digesting the DNA sample with an endonuclease to obtain a DNA sample with a single-stranded nick (2) Polymerase, dATP, dTTP, dGTP, and methylation-modified dCTP are used for polymerization based on the DNA sample with single-stranded nicks, so as to obtain mixed DNA, each strand of the mixed DNA
  • the 5'end is the original sequence of the DNA sample, the 3'end of each chain is a synthetic sequence, and the C base on the 3'end of each chain is modified by methylation; (3) Based on the The mixed DNA is subjected to bisulfite treatment so as to obtain the converted mixed DNA; (4) based on the converted mixed DNA, amplify to obtain a sequencing library.
  • DNA samples are cut with endonucleases, such as DnaseI, to produce random single-stranded nicks, which are phosphorylated at the 5'end and hydroxyl at the 3'end.
  • endonucleases such as DnaseI
  • polymerase such as BST polymerase
  • BST polymerase polymerase
  • the methylated modified dCTP and the normal dATP, dTTP, dGTP mixture, BST polymerase polymerizes from the 3'end of the nick, and replaces the nick strands to produce a mixed DNA fragment of original DNA plus newly generated DNA, original DNA
  • the original methylation information is retained; the C bases on the newly generated DNA are all methylated modification, and the original DNA information is retained under the treatment of sulfite or enzyme.
  • the DNA sample may be genomic DNA.
  • Available endonucleases in addition to DnaseI, can also be produced by other random restriction endonucleases such as DnaseII or other restriction endonucleases, or single-strand nicks produced by other endonucleases.
  • the length of the DNA sample can be controlled between 100-1000bp.
  • Polymerase and 5mC dNTP 5m dCTP, dATP, dTTP, dGTP equimolar mixing
  • the base A is added to the 3'end of the newly generated DNA double strand.
  • the usable polymerase can also be another polymerase with substitution such as phi29 or a polymerase with 5-3 exonuclease activity and terminal A activity such as klenow, or other Does not contain DNA polymerase with or without A, replacement or 5-3 exonuclease activity.
  • the 5'end cytosine of the DNA strand in the obtained mixed DNA retains the original methylation modification information, and all the 3'end cytosines are converted into methylation modified cells.
  • Pyrimidine Connect a methylated sequencing adapter to the resulting mixed DNA, and then under sulfite treatment, the former unmethylated C base is converted to U base, and the methylated C base remains unchanged , Retain the original methylation information; the latter's methylated C all remain unchanged, and retain the original DNA sequence information.
  • performing PCR amplification can obtain a sequencing library that retains methylation information and original DNA information.
  • the obtained library can be subjected to high-throughput sequencing to obtain DNA methylation information and original DNA sequence. information.
  • the corresponding methylated sequencing adapter can be any methylated sequencing adapter of MGI, Illumina, Proton, or other sequencing platforms. Accordingly, these platforms can be used to perform high-throughput sequencing on the obtained sequencing library.
  • high-throughput sequencing can be paired-end or single-end sequencing, and paired-end sequencing is preferred.
  • One of the reads contains information sites for sulfite processing: unmethylated cytosine has been converted to thymine, It is used to determine the methylation site; the other reads retain the original DNA information, which is used to assist in positioning and comparison information, so as to accurately obtain genome methylation information and obtain genomic DNA sequence information.
  • the nucleic acid sequence analysis and comparison method is double-ended analysis.
  • the reads containing the sulfite processing information site use software such as BS-map (methylation comparison method) to obtain the position information on the genome, and the reads that retain the original information use BWA Wait for software to compare the whole genome information to obtain the position information on the genome.
  • BS-map methylation comparison method
  • the former position shall prevail; 2) If the former has multiple positions and the latter also has multiple positions, then the position that has both of them and is not far apart shall prevail; if there are many such positions, the best Comparison position; 3) If the former appears in one position, the latter appears in multiple positions, and the position near the former (within 100-1000bp) is a candidate position of the latter, then the former position shall prevail; select the optimal ratio Based on the results, the sequence generated by PCR is de-redundant, genome information is analyzed, genome methylation information is analyzed, genome base mutation frequency is counted, and genome methylation rate is counted.
  • one or more pairs of PCR amplification primers for amplifying the gene locus of interest are designed, and one primer is positioned in a region that retains DNA methylation information, The other primer is positioned in the region where the original DNA information is retained, and PCR amplification is performed to obtain the sequence of the gene locus of interest and perform methylation analysis.
  • the amplified product can be used for electrophoresis, sanger, or high-throughput sequencing.
  • One primer is designed to be located at the sequence position where cytosine has undergone methylation modification, and the other primer is located at the sequence position where unmethylated cytosine has been converted to thymine, and PCR amplification is performed to obtain the gene of interest The sequence of the site and methylation analysis.
  • the original DNA sequence near the methylation site of interest is hybridized by the probe, and the entire DNA molecular chain is captured to obtain the target site
  • the methylation library of the target site is obtained by magnetic bead adsorption and elution to obtain the target site methylation capture library, and then the library obtained after PCR amplification is subjected to high-throughput sequencing.
  • the sulfite-treated DNA can be enriched and amplified, and the amount of capture input can be increased. There is no need to traverse all methylation states to design probes, which is beneficial to reduce the types of probe design and improve Specificity of probe capture.
  • the probes can be designed as DNA probes or RNA probes, liquid or solid phase probes.
  • the length of the probe can be 60-120 nt.
  • the probe design is based on the original DNA sequence, and the probe has biotin or other modifications. , Used for subsequent separation and purification, or probes designed by other methods. This method is compatible with all types of existing probes for DNA sequence capture; half of the DNA methylation is retained after the sulfite treatment by the probe Information, half of the template that retains the DNA sequence information is hybridized and captured, and the DNA probe is combined with the DNA portion that retains the DNA sequence information (preferably obtained from the above scheme).
  • the DNA obtained after hybridization is captured and eluted by streptavidin or other biologically modified magnetic beads, and the eluted product is amplified by PCR to obtain a computer-based sequencing library.
  • Example 1 Whole genome methylation library construction and sequencing
  • the sequencing type is PE100, and the sequencing depth is 30X.
  • Data analysis including data utilization, comparison rate, GC preference and other performance.
  • the experimental process is as follows:
  • NEB DnaseI (Cat. No. 0303S) and BST (Cat. No. M0374S) polymerases were used to perform end repair and A reaction on the product.
  • the reaction system and conditions are as follows:
  • 5mc dNTP mix represents a mixture of methylated modified dCTP and normal dATP, dTTP, and dGTP.
  • Connector 2 (SEQ ID NO: 2): 5’AGCCAAGGTCAGTAACGACATGGCTACGATCCGACTT
  • the C in the sequence of linker 1 and 2 is protected by methylation modification, and the N base is the sample tag sequence.
  • CT Conversion Reagent solution Take out CT Conversion Reagent (solid mixture) from the kit, add 900 ⁇ L of water, 50 ⁇ L of M-Dissolving Buffer and 300 ⁇ L of M-dissolving buffer respectively Solution (M-Dilution Buffer), dissolve at room temperature and shake for 10 minutes or shake on a shaker for 10 minutes.
  • the amplification enzyme system is from kapa company KAPA HiFi HotStart Uracil+ReadyMix (2X), the item number is kk2801.
  • a Bioanalyzer analysis system (Agilent, Santa Clara, USA) was used to detect the size and content of the inserted fragments of the library; thus, the constructed high-throughput sequencing library of the specific region of the genome of the sample was tested.
  • CpG coverage in Table 1 refers to the number of sites with a depth of 1X or more
  • coverage refers to the proportion of all CpG sites with a depth of 1X or more to all CpG sites
  • 10X coverage refers to It is the ratio of all CpG sites with depths of 10X or above to all CpG sites
  • Design primers for 10 methylation sites forward primers are designed upstream of the site, designed for the genome sequence after sulfite treatment, and reverse primers are designed downstream of the site, designed for the original sequence of the genome (sequence information As shown in Table 1), multiple primers were used to perform multiplex PCR on the methylated-DNA mixture after sulfite treatment.
  • the N base is the molecular tag
  • the PCR reaction conditions are as follows:
  • the specific primer pool is equimolarly mixed with the above primers, and the final concentration is 10 ⁇ M.
  • MGI exon capture kit for hybridization capture (MGIeasy exome capture V4 probe Needle reagent item number 1000007745), capture the library to MGIseq-2000 sequencer for on-machine sequencing, sequencing type PE100, sequencing depth 100X, and then perform data analysis, including data utilization, comparison rate, GC preference and other performance.
  • MGI exon capture kit for hybridization capture (MGIeasy exome capture V4 probe Needle reagent item number 1000007745)
  • methylated linker sequence is the same as in Example 1, that is, as shown in SEQ ID NO: 1 and SEQ ID NO: 2.
  • CT conversion reagent (CT Conversion Reagent) solution: Take out the CT conversion reagent (solid mixture) from the kit, add 900 ⁇ L of water, 50 ⁇ L of M-Dissolving Buffer and 300 ⁇ L of M-Dissolving Buffer, respectively. Dilution buffer (M-Dilution Buffer), dissolve at room temperature and shake for 10 minutes or shake on a shaker for 10 minutes.
  • the amplification enzyme system is from kapa company KAPA HiFi HotStart Uracil+ReadyMix (2X), the item number is kk2801.
  • the sequences of the universal primer 1 and the universal primer 2 are the same as in Example 1, that is, as shown in SEQ ID NO: 3 and SEQ ID NO: 4.
  • Centrifuge the centrifuge tube instantaneously, place it on a magnetic stand, and let it stand for 2-5 minutes until the liquid is clear. Use a pipette to carefully aspirate the supernatant and discard it.
  • Centrifuge the centrifuge tube instantaneously, place it on a magnetic stand, and let it stand for 2-5 minutes until the liquid is clear. Use a pipette to carefully aspirate the supernatant and discard it.
  • Centrifuge the centrifuge tube instantaneously, place it on a magnetic stand, and let it stand for 30 seconds until the liquid is clear. Use a pipette to carefully aspirate the supernatant and discard it.
  • the comparison rate in Table 12 refers to the ratio of the comparison to the genome
  • the repetition rate refers to the ratio of reads measured at the same position
  • the capture rate refers to the ratio of reads compared to the target area to the total read.
  • the average depth refers to the average depth of the target area covered by sequencing
  • the 20x coverage refers to the proportion of the entire area covered by the target area to a depth of 20X or more.
  • the DNA sequence and the methylation sequence are compared, and the compared data (87.8%) obtained is then counted on the data (49.5%) falling in the exon region and the flanking region, and the target region is counted With an average depth (99.3X) and 20X coverage (95.2%), it can be seen that this method can effectively capture methylation.
  • first, second, etc. are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include at least one of the features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

提供了基于DNA样本构建测序文库的方法及应用。该方法包括:基于DNA样本,利用内切酶酶切,以便获得带有单链切口的DNA样本;基于该带有单链切口的DNA样本,利用聚合酶、dATP、dTTP、dGTP和甲基化修饰的dCTP进行聚合反应,以便获得混合DNA,所述混合DNA包括反向互补的两条链,每一条链的5'端为所述DNA样本的原始序列,每一条链的3'端为合成序列,每一条链3'端上碱基C是经甲基化修饰的;基于所述混合DNA,进行亚硫酸氢盐处理或其他方法处理,以便获得经转化后的混合DNA;基于所述经转化的混合DNA,扩增获得测序文库。由此可以用于全基因组甲基化测序或者多重PCR靶向测序和探针捕获测序。

Description

基于DNA样本构建测序文库的方法及应用
优先权信息
无。
技术领域
本发明涉及基因测序领域,具体涉及一种基于DNA样本构建测序文库的方法及应用。
背景技术
DNA甲基化是一种表观调控修饰,它在不改变碱基序列的情况下,参与调控蛋白质合成的多少。对人类来说,DNA甲基化是一种非常奇妙的化学修饰,亲人的关怀,机体的衰老、抽烟、酗酒甚至肥胖,都会被甲基化如实地记录到基因组上。基因组就像是一个日记本,甲基化作为文字,记录下人体的经历。DNA甲基化是重要的表观遗传学标记信息,获得全基因组范围内所有C位点的甲基化水平数据,对于表观遗传学的时空特异性研究具有重要意义。以新一代高通量测序平台为基础,进行全基因组DNA甲基化水平图谱绘制,特定物种的高精确度甲基化修饰模式的分析,必将在表观基因组学研究中具有里程碑式的意义,并为细胞分化、组织发育等基础机制研究,以及动植物育种、人类健康与疾病研究奠定基础。
然而无论是全基因组甲基化测序WGBS(Whole Genome Bisulfite Sequencing)还是基因组特定区域测序均面临着不同的困难。
发明内容
本发明旨在至少在一定程度上解决相关技术中的技术问题之一。为此,本发明的一个目的在于提出一种基于DNA样本构建测序文库的方法,借助该方法可以利用甲基化的DNA样本构建测序文库,所获得的测序文库可以满足全基因组甲基化测序或者特定区域甲基化测序的需求。
本发明的发明人在长期的研究过程中注意到:
全基因组甲基化测序(Whole Genome Bisulfite Sequencing,WGBS),即全基因组亚硫酸氢盐测序,作为一种研究生物甲基化的最常用手段,可以覆盖所有甲基化位点,能够获得全面的甲基化图谱。但其在高通量测序中仍然遇到了很多挑战,主要表现为:第一,经亚硫酸氢盐处理后的未甲基化C碱基会转变成U碱基,整个基因组的GC含量发生极端变化,造成后续扩增产生极大的偏好性;第二,亚硫酸氢盐处理后的数据在分析时通常面临 比较大的困难,例如亚硫酸氢盐处理后基因组中大多数的胞嘧啶(C)都会转变为胸腺嘧啶(T),造成碱基不平衡,测序得到的结果比对(map)到参考基因组上效率有限,会遇到过多的多重比对的情况,有的位置即使增大测序通量也无法得到DNA甲基化信息,造成全基因甲基化信息丢失。
总体来说,WGBS是DNA甲基化研究比较好的方法,但是考虑到其本身的设计缺陷和检测偏好性以及生物信息学分析时遇到的问题,极大地阻碍了其应用。为此,本发明的发明人在研究过程中发现,在对DNA甲基化样本建库测序的过程中,可以通过改进的全基因组甲基化测序的方法,改善高CG的偏好性,提高比对的有效性,能够确保准确检测到DNA的甲基化信息。例如,通过内切酶和聚合酶在DNA模板链中引入甲基化修饰的胞嘧啶,制备成含有原始模板和新生成模板的混合DNA链。混合DNA链中的原始模板携带有原始DNA中胞嘧啶的甲基化修饰信息;混合DNA链中的新生成模板中所有的胞嘧啶全部是新生成带有甲基化修饰的胞嘧啶,能够在亚硫酸盐的处理下保留DNA的原始信息。在亚硫酸盐的处理下,原始模板中未甲基化的胞嘧啶(C)转换为尿嘧啶(U),新生成模板中的胞嘧啶因为全部甲基化,因此经过亚硫酸氢盐处理后的DNA一部分保留DNA甲基化信息,另外一部分保留原始DNA信息,组成保留DNA甲基化信息&保留DNA信息的混合DNA片段,基于这些片段可以构建测序文库,用于全基因组甲基化测序。
同时考虑到全基因组甲基化测序的较大的数据量,以及较高的成本,应用序列捕获技术对基因组特定区域进行选择性富集,通过合适的方法将感兴趣的区域从基因组中分离出来,然后再对该目标区域进行测序,可以针对性地进行基因组学的研究,而且能够降低成本。随着探针捕获技术的发展,各家公司都开发了针对目标区域甲基化的捕获产品,如agilent和roche,其中Agilent先针对想研究的目标区域进行捕获,然后对样本进行亚硫酸盐处理后建库,其缺点是在进行捕获前不能对样本进行富集,对于低起始量样本遇到了极大的挑战;Roche采用的策略先进行亚硫酸盐处理后再富集,然后设计探针进行捕获,由于设计的探针是针对亚硫酸处理后的DNA,需要遍历设计胞嘧啶出现甲基化或者未甲基化的状态,其探针设计昂贵,并且由于设计过多的可变的探针,其探针的捕获的特异性也大幅度下降。
鉴于此,借助上述改进的全基因组甲基化测序中文库的构建方法,发明人创造性的开发了一种全新的捕获模式,该模式结合了两种捕获方式的优点,可以在捕获前富集DNA并且设计的探针种类少。
具体而言,本发明提供了如下技术方案:
根据本发明的第一方面,本发明提供了一种基于DNA样本构建测序文库的方法,包括:基于所述DNA样本,利用内切酶酶切,以便获得带有单链切口的DNA样本;基于所述带有单链切口的DNA样本,利用聚合酶、dATP、dTTP、dGTP和甲基化修饰(5mC)的dCTP 进行聚合反应,以便获得混合DNA,所述混合DNA包括反向互补的两条链,其中每一条链的5’端为所述DNA样本的原始序列,每一条链的3’端为合成序列,所述每一条链的3’端上碱基C是经甲基化修饰的;基于所述混合DNA,进行亚硫酸氢盐处理,以便获得经转化后的混合DNA;基于所述经转化的混合DNA,扩增获得测序文库。
本发明通过内切酶和聚合酶在DNA模板链中引入甲基化修饰的胞嘧啶,制备成含有原始模板和新生成模板的混合DNA链。混合DNA链中的原始模板携带有原始DNA中胞嘧啶的甲基化修饰信息;混合DNA链中的新生成模板中所有的胞嘧啶全部是新生成带有甲基化修饰的胞嘧啶,能够在亚硫酸盐的处理下保留DNA的原始信息。在亚硫酸盐的处理下,原始模板中未甲基化的胞嘧啶(C)转换为尿嘧啶(U),新生成模板中的胞嘧啶因为全部甲基化修饰状态,因此经过亚硫酸氢盐处理后的DNA链一部分保留DNA甲基化信息,另外一部分保留原始DNA信息,组成5’端保留DNA甲基化信息&3’端保留原始DNA信息的混合DNA片段,基于这些片段可以构建测序文库,用于全基因组甲基化测序或者用于多重PCR靶向测序和探针捕获测序。
相对于传统的WGBS文库,其一半是甲基化处理后的碱基信息,一半保留了原有的DNA碱基信息,平衡了亚硫酸盐对模板处理的极端偏好性,在后续的PCR过程中能够有效改善甲基化文库在CpG岛上的扩增偏好性;并且相当于在一次建库中完成了WGBS和WGS的文库的制备。同时通过保留的DNA信息可以准确定位比对到基因组上的位置信息,增加甲基化比对的准确性;而且简化了操作步骤,能够一步完成文库的打断、末端修复和加A的过程。此外基于该混合链文库可以开发多重PCR捕获技术,该捕获技术的PCR引物一条设计在保留甲基化信息的DNA序列上,另外一条设计在保留DNA原始序列信息的DNA序列上,这样避免了常规针对转化后DNA设计甲基化引物存在引物二聚体的现象,并且相对于常规的甲基化引物具有更高的特异性。更多地,基于该混合文库可以开发基于探针捕获技术,探针设计在保留原始DNA序列信息的序列上,相对于针对转化后DNA序列进行设计,探针的设计难度大大降低。
根据本发明的实施例,以上所述基于DNA样本构建测序文库的方法可以进一步包括如下技术特征:
在本发明的一些实施例中,所述内切酶为DNaseI或者DNaseII中的至少一种,或者能够产生所述单链切口的任意内切酶。在本发明的一些实施例中,所述聚合酶为BST聚合酶、phi29聚合酶、klenow聚合酶或者任意一种可以实现DNA聚合功能的聚合酶。
在本发明的一些实施例中,所述带有单链切口的DNA样本的长度为100~1000bp。
在本发明的一些实施例中,所述方法进一步包括:基于所述混合DNA,连接甲基化测序接头,进行硫酸氢盐、亚硫酸氢盐处理或其他可以转化甲基化信息的处理方式,以便获 得经转化后的混合DNA,所述甲基化测序接头中含有第一通用和第二通用序列;基于所述经转化的混合DNA,利用通用引物进行扩增,以便获得测序文库,所述通用引物和所述第一通用序列和所述第二通用序列匹配。其中经转化的混合DNA链的5’端是经过处理转化后的DNA序列,所有的未甲基化的胞嘧啶转换为U碱基;3’端是新合成的DNA序列,所有胞嘧啶经过甲基化修饰,在转换处理下保留原始DNA序列信息不变。通过这种方式构建测序文库,可以实现全基因组甲基化测序。
在本发明的一些实施例中,所述甲基化测序接头为MGI、Illumina、Proton或者其他测序平台中的任意一种。
在本发明的一些实施例中,所述DNA样本为全基因组DNA样本。
在本发明的一些实施例中,所述方法进一步包括:
混合DNA未经过连接接头,直接进行进行硫酸氢盐、亚硫酸氢盐处理或其他可以转化甲基化信息的处理方式,以便获得经转化后的混合DNA,其中经转化的混合DNA链的5’端是经过处理转化后的DNA序列,所有的未甲基化的胞嘧啶转换为U碱基;3’端是新合成的DNA序列,所有胞嘧啶经过甲基化修饰,在转换处理下保留原始DNA序列信息不变。然后基于所述经转化的混合DNA,利用特异性引物进行所述扩增,以便获得基于所述DNA样本的目标区域的测序文库,所述特异性引物包括第一特异性引物和第二特异性引物,所述第一特异性引物与所述经转化的混合DNA的5’端序列一致,所述第二特异性引物与所述经转化的混合DNA的3’端序列互补。
针对经转化的混合DNA的任意一条链的5’端和3’端分别设计相应的引物,即第一特异性引物和第二特异性引物,其中一条特异性引物是针对保留甲基化信息的DNA序列设计,另一特异性引物是针对原始DNA序列而设计,一条引物富含ATG,一条引物含有ATCG,能够减少甲基化多重PCR过程中遇到的引物二聚体问题。
在本发明的一些实施例中,所述方法进一步包括:
基于所述经转化的混合DNA,利用探针进行杂交捕获,洗脱,获得杂交产物,所述探针用于捕获所述经转化的混合DNA的3’端序列,也就是在亚硫酸氢盐处理后DNA序列信息保持不变的模板链;基于所述杂交产物,扩增获得测序文库。采用本发明的方法在利用探针进行杂交捕获的过程中,针对保持原始DNA序列信息的链设计探针,相对于普通捕获方式针对转化后DNA链设计探针,能够降低捕获探针的设计难度;而且能够提高针捕获的特异性,极大极高捕获的效率和数据利用率。同时适用于对于微量DNA进行甲基化靶向文库的构建和测序。
根据本发明的第二方面,本发明提供了一种对DNA样本进行测序的方法,包括:
基于所述DNA样本,利用本发明第一方面任一实施例所述的方法获得测序文库;
基于所述测序文库,测序获得所述DNA样本的测序结果。
根据本发明的实施例,所述测序为双端测序或者单端测序。
根据本发明的第三方面,本发明提供了一种确定DNA样本的甲基化状态的方法,包括:
基于所述DNA样本,利用本发明第一方面任一实施例所述的方法获得所述测序文库;
基于所述测序文库,测序获得所述DNA样本的测序结果;
将所述DNA样本的5’端和3’端的测序结果分别与参考基因组进行比对,以便确定所述DNA链的5’端和所述DNA链的3’端的位置信息;
基于所述5’端和所述3’端的位置信息,比较分析所述所述DNA样本的位置,以便确定所述DNA样本的甲基化状态。
根据本发明的实施例,以上所述确定DNA样本的甲基化状态的方法可以进一步包括如下技术特征:
在本发明的一些实施例中,所述确定DNA样本的甲基化状态的方法进一步包括:
若所述3’端对应多个候选位置,所述5’端对应一个候选位置,且所述5’端对应候选位置的附近位置属于所述3’端的多个候选位置的一个,则确定所述5’端的对应位置为准;
若所述3’端对应多个候选位置,所述5’端对应多个候选位置,则确定所述5’端和所述3端的共同的最优候选位置为准;
若所述3’端对应一个候选位置,所述5’端对应多个候选位置,且所述3’端对应候选位置的附近位置属于所述5’端的多个候选位置的一个,则确定所述3端的对应位置为准。
若所述3’端对应一个候选位置,所述5’端对应一个候选位置,且所述3’端对应候选位置的附近位置属于所述5’端的候选位置的附近,则确定所述3’端或5’端的对应位置为准。其他情况归于多重比对,不能准确知道reads的比对位置,但可以优先采取以3端比对的位置为主要比对位置。
在本发明的一些实施例中,采用BWA软件将所述3’端与所述参考基因组比对,采用BS-map软件将所述5’端与所述参考基因组比对。
根据本发明的第四方面,本发明提供了一种试剂盒,包括:核酸内切酶、核酸扩增试剂、甲基化修饰的dCTP和甲基化检测试剂。
在本发明的一些实施例中,所述试剂盒进一步包括第一特异性引物和第二特异性引物,所述第一特异性引物包括SEQ ID NO:7~SEQ ID NO:16,所述第二特异性引物包括SEQ ID NO:17~SEQ ID NO:26。
在本发明的一些实施例中,所述试剂盒进一步包括探针,所述探针用于捕获目标序列,构建目标区域核酸文库。
根据本发明的第五方面,本发明提供了一种双链DNA,所述双链DNA包括反向互补的两条链,其中每一条链包括5’端序列和3’端序列,每一条链的3’端上的碱基C均是经甲基化修饰的。所提供的双链DNA的所述5’端的序列是保留甲基化信息的序列,可以是经过亚硫酸盐处理,所有未甲基化胞嘧啶转换为U碱基的序列,也可以是其他酶处理得到的序列,如(TET2氧化处理,再经过APOBEC酶处理),所述3端上序列的C碱基均是经甲基化修饰的,在转化过程中保留胞嘧啶信息不变。
在本发明的一些实施例中,所述双链DNA的长度为100~1000bp。
附图说明
本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:
图1是根据本发明的实施例提供的DNA甲基化混合文库建库流程图。
图2是根据本发明的实施例提供的DNA甲基化混合多重PCR流程图。
图3是根据本发明的实施例提供的甲基化DNA混合文库的质检结果图。
图4是根据本发明的实施例提供的不同方法的比对率结果图。
图5是根据本发明的实施例提供的不同方法在不同GC含量区域上的CpG位点的覆盖度结果图。
图6是根据本发明的实施例提供的不同方法在全基因上的覆盖度结果。
图7是根据本发明的实施例提供的各个扩增子的测序深度结果图。
图8是根据本发明的实施例提供的DNA甲基化混合文库捕获流程图。
图9是根据本发明的实施例提供的目标位点甲基化率对比结果图。
具体实施方式
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。
为了对于本申请有更为直观的理解,下面对本申请中存在的术语进行解释和说明。本领域技术人员需要理解的是,这些解释和说明仅为了理解更为方便,不应看做是对本申请保护范围的限制。
本文中,如无特殊说明,在表示碱基时,碱基N或者n均指可以为任意碱基A、T、C或者G。
本文在描述利用亚硫酸氢盐进行转化处理时,无论是重硫酸盐、重亚硫酸盐、亚硫酸 氢盐或重亚硫酸氢盐等均指的相同的含义,同时也包括其他酶进行转换处理,也均包含在本发明的范围之内。
根据本发明的一个方面,本发明提供了一种基于DNA样本构建测序文库的方法,包括:(1)基于所述DNA样本,利用内切酶酶切,以便获得带有单链切口的DNA样本;(2)基于所述带有单链切口的DNA样本,利用聚合酶、dATP、dTTP、dGTP和甲基化修饰的dCTP进行聚合反应,以便获得混合DNA,所述混合DNA的每一条链的5’端为所述DNA样本的原始序列,每一条链的3’端为合成序列,所述每一条链的3’端上C碱基是经甲基化修饰的;(3)基于所述混合DNA,进行亚硫酸氢盐处理,以便获得经转化后的混合DNA;(4)基于所述经转化的混合DNA,扩增获得测序文库。
DNA样本经过内切酶,例如DnaseI酶切后产生随机单链切口,其5’端带有磷酸化修饰,3’端带有羟基,通过加入聚合酶(例如BST聚合酶)和等摩比比例的甲基化修饰的dCTP和正常的dATP、dTTP、dGTP混合物,BST聚合酶从切口的3’端开始进行聚合,并置换掉切口链,产生原始DNA加新生成DNA的混合DNA片段,原始DNA保留原始甲基化信息;新生成DNA上C碱基都是甲基化修饰,在亚硫酸盐或酶的处理下保留原始DNA信息。
其中DNA样本可以为基因组DNA。可用的内切酶,除了DnaseI,也可以由其他随机限制性内切酶产生单链切口如DnaseII或者是其他限制性内切酶,或者是其他内切酶产生的单链切口。DNA样本的长度可以控制在100-1000bp之间。
通过聚合酶和5mC dNTP(5m dCTP、dATP、dTTP、dGTP等摩尔混合)进行聚合和置换反应,并在新生成DNA双链的3’末端加上了碱基A。可用的聚合酶除了BST聚合酶之外,也可以是另外一种带有置换的聚合酶如phi29或者是有5-3外切酶活性和末端加A活性的聚合酶如klenow等,或者是其他不含有加A或者不加A、有置换或者5-3外切酶活性的DNA聚合酶。
DNA样本经过上述步骤(1)和步骤(2)处理,得到的混合DNA中DNA链5’端胞嘧啶保留原始的甲基化修饰信息,3’端胞嘧啶全部转换为甲基化修饰的胞嘧啶。将所得到的混合DNA上连接上甲基化的测序接头,然后在亚硫酸盐处理下,前者未甲基化的C碱基转换为U碱基,甲基化修饰的C碱基保持不变,保留原有甲基化信息;后者甲基化的C全部保持不变,保留原始DNA序列信息不变。通过甲基化测序接头上的通用引物,进行PCR扩增可以获得保留甲基化信息和原始DNA信息的测序文库,得到的文库进行高通量测序即可得到DNA甲基化信息和原始DNA序列信息。在至少一些实施方式中,相应的甲基化测序接头可以是MGI、Illumina、Proton或者其他测序平台的任意一种甲基化测序接头。相应地,可以利用这些平台对获得的测序文库进行高通量测序。
在至少一些实施方式中,高通量测序可以进行双端或单端测序,优先进行双端测序, 其中一条reads含有亚硫酸处理信息位点:未甲基化的胞嘧啶已转变为胸腺嘧啶,用于判断甲基化位点;另外一条reads保持原有的DNA信息,用于辅助定位比对信息,从而准确得到基因组甲基化信息,同时得到基因组DNA序列信息。
核酸序列的分析比对方法为双端分析,含有亚硫酸处理信息位点的reads采用BS-map等软件(甲基化比对方式)得到基因组上的位置信息,保留原有信息的reads采用BWA等软件比对全基因组信息得到基因组上的位置信息,1)如果前者出现多个位置,后者出现一个位置,且后者附近的位置(100-1000bp以内)是前者的一个候选位置,那么以后者的位置为准;2)如果前者出现多个位置,后者也出现多个位置,那么以两者都有且相隔不远的位置为准;如果这种位置有好多个,取最优的比对位置;3)如果前者出现一个位置,后者出现多个位置,且前者附近的位置(100-1000bp以内)是后者的一个候选位置,那么以前者位置为准;选取最优的比对结果,对PCR产生的序列进行去冗余,分析基因组信息,分析基因组甲基化信息,统计基因组碱基突变频率,统计基因组甲基化率。
在至少一些实施方式中,在进行步骤(4)之前,设计一对或者多对用于扩增感兴趣的基因位点的PCR扩增引物,一条引物定位在保留DNA甲基化信息的区域,另一引物定位于保留原始DNA信息的区域,进行PCR扩增,得到感兴趣的基因位点的序列并进行甲基化分析,扩增产物可以用于电泳、sanger或者高通量测序等。所设计的一条引物定位于胞嘧啶做过甲基化修饰的序列位置,另一引物定位于未甲基化的胞嘧啶已转变为胸腺嘧啶的序列位置,进行PCR扩增,得到感兴趣的基因位点的序列并进行甲基化分析。
在至少一些实施方式中,在进行步骤(4)之前,通过探针来杂交感兴趣的甲基化位点附近的保留原始DNA序列,将整条DNA分子链捕获下来,就能获得目标位点的甲基化文库,通过磁珠吸附洗脱得到目标位点甲基化捕获文库,然后经过PCR扩增后得到的文库再进行高通量测序。通过设计探针,能够对亚硫酸盐处理后的DNA进行富集扩增,提高捕获的投入量,且不需要遍历所有甲基化的状态设计探针,有利于减少探针设计的种类,提高探针捕获的特异性。
其中,探针可以设计为DNA探针或RNA探针,液相或者固相探针,探针的长度可以为60-120nt,探针设计针对原始DNA序列,探针上有生物素或其他修饰,用于后续分离纯化,或者其他方法设计到的探针,该方法可以兼容现有的针对DNA序列捕获的所有类型探针;通过探针对亚硫酸盐处理后的,一半保留DNA甲基化信息,一半保留DNA序列信息的模板进行杂交捕获,DNA探针结合保留DNA序列信息的DNA部分(优先由上述方案得到)。杂交后得到的DNA经过链霉亲和素或者其他生物修饰的磁珠进行捕获洗脱,对洗脱后的产物进行PCR扩增,得到上机测序文库。
下面将结合实施例对本发明的方案进行解释。本领域技术人员将会理解,下面的实施 例仅用于说明本发明,而不应视为限定本发明的范围。实施例中未注明具体技术或条件的,按照本领域内的文献所描述的技术或条件或者按照产品说明书进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。
实施例1:全基因组甲基化文库建库测序
取10ng炎黄细胞系gDNA,然后分别按照本发明方法和常规方法对DNA进行甲基化全基因组文库制备,文库到BGISEQ-500测序仪上进行上机测序,测序类型PE100,测序深度30X,然后进行数据分析,包括数据利用率、比对率、GC偏好性等性能。其实验过程如下:
1、打断末端修复加A
采用NEB公司的DnaseI(货号0303S)和BST(货号M0374S)聚合酶对产物进行末端修复和加A反应,反应体系和条件如下:
Figure PCTCN2019092116-appb-000001
将上述反应体系置于PCR仪上,37℃ 10min,65℃ 30min。其中5mc dNTP mix代表甲基化修饰的dCTP和正常的dATP、dTTP、dGTP的混合物。
反应完后用1.0X AMPure磁珠进行纯化,最后将纯化产物溶于20μl洗脱缓冲液。
2、连接甲基化接头:
1)将上一步得到的DNA按下表配制甲基化接头(也可以称为“甲基化测序接头”)的连接反应体系:
Figure PCTCN2019092116-appb-000002
*甲基化接头序列为:
接头1(SEQ ID NO:1):
5’-/5Phos/AGTCGGAGGCCAAGCGGTCTTAGGAAGACAANNNNNNNNNNCAACTCCTTGGCTCACA-3
接头2(SEQ ID NO:2):5’AGCCAAGGTCAGTAACGACATGGCTACGATCCGACTT
接头1、2序列中的C均进行了甲基化修饰保护,N碱基为样本标签序列。
2)将上述反应体系置于20℃的Thermomixer(Eppendorf)上,进行反应15min,获得连接产物。反应完后用1.0X AMPure磁珠进行纯化,最后将纯化产物溶于22μl洗脱缓冲液。
3、亚硫酸盐处理
采用试剂盒EZ DNA Methylation-Gold Kit TM(ZYMO),将上述连接好的DNA进行重亚硫酸盐共处理。具体步骤如下:
(1)试剂
制备CT转换试剂(CT Conversion Reagent)溶液:从试剂盒中取出CT转换试剂(固体混合物),分别加入900μL的水、50μL的M-溶解缓冲液(M-Dissolving Buffer)和300μL的M-稀释缓冲液(M-Dilution Buffer),室温下溶解并且震荡10分钟或在摇床上摇动10分钟。
M-洗涤缓冲液的制备:向M-洗涤缓冲液中添加24mL 100%的乙醇,备用。
(2)在PCR管中加入130μL的CT转换试剂溶液和上述连接好的DNA,轻弹或移液器吹悬混合样品。
将样品管放到PCR仪上按以下步骤操作:
98℃下持续5分钟,64℃下持续2.5小时。
完成上述操作后,立刻进行下一步操作或者在4℃下存储(最多20小时)备用。
(3)将Zymo-Spin IC TMColumn放入收集管(Collection Tube)中,并加入600μL的M-结合缓冲液(M-Binding Buffer)。
将上述重亚硫酸盐处理的样品加入到含M-结合缓冲液的Zymo-Spin IC TMColumn中,盖上盖子颠倒混匀。
全速(>10,000x g)离心30秒,弃收集管中的收集液。
向柱中加入100μL的M-洗涤缓冲液,全速(>10,000x g)离心30秒,弃收集管中的液体。
向柱中添加200μL的M-Desulphonation Buffer,室温放置15min,全速(>10,000x g)离心30s,弃收集管中的液体。
向柱中添加200μL的M-洗涤缓冲液,全速(>10,000x g)离心30s,弃收集管中的液体,并再重复此步骤1次。
将Zymo-Spin IC TMColumn置于新的1.5mL EP管中,加入20μL的M-洗脱缓冲液r到柱基质中,室温放置2min,全速(>10,000x g)离心洗脱目的片段DNA。
4、PCR扩增
将上一步得到的目的片段DNA按以下体系配制PCR反应体系,扩增酶体系来自kapa公司KAPA HiFi HotStart Uracil+ReadyMix(2X),货号为kk2801。
Figure PCTCN2019092116-appb-000003
通用引物1(SEQ ID NO:3):/5Phos/GAACGACATGGCTACGA
通用引物2(SEQ ID NO:4):TGTGAGCCAAGGAGTTG
PCR反应条件
Figure PCTCN2019092116-appb-000004
反应完后用AMPure磁珠进行纯化,最后将纯化产物溶于22μl洗脱缓冲液。
5、文库检测:
使用Bioanalyzer分析系统(Agilent,Santa Clara,USA)检测文库插入片段的大小及含量;由此,构建的样本的基因组特定区域的高通量测序文库经检测。
6、上机测序
将得到的文库进行高通量测序,测序平台BGIseq-500,测序类型PE100,测序后数据经过比对后统计各项基本参数,包括下机数据、可用数据、比对数据等。其结果如下表1所述。
其中,利用本发明方法所获得的文库质检图如图3所示;
表1测序结果
Figure PCTCN2019092116-appb-000005
其中表1中CpG覆盖指的是该位置有1X或者以上的深度的位点的个数,覆盖度指的是所有1X或者以上深度的CpG位点占所有CpG位点的比例,10X覆盖度指的是所有10X或者以上深度CpG位点占所有CpG位点的比例其中,采用上述方法所获得的比对率的结果如图4所示。从该结果可以看出,采用本发明的方法所获得的比对率要优于常规WGBS。
采用不同方法所获得的GC含量的覆盖度如图5所示,从该结果可以看出,采用本发明的方法得到的GC含量的覆盖度由于常规WGBS。
采用不同方法所获得的全基因组上的覆盖度结果如图6所示,从该结果可以看出,采用本发明的方法所获得的全基因组上的覆盖度要优于常规的方法。
而且,从表1示出的结果可以看出,采用本发明的方法CpG位点的检测数量要高于常规WGBS。
实施例2:靶向甲基化建库
针对10个甲基化位点设计引物,正向引物设计在位点上游,针对亚硫酸盐处理后的基因组序列设计,反向引物设计在位点的下游,针对基因组原始序列进行设计(序列信息如表1),采用多重引物对亚硫酸处理后的甲基化-DNA混合物进行多重PCR。
1、打断末端修复加A
采用NEB公司的DnaseI和BST对产物进行末端修复和加A反应,反应体系和条件如下
Figure PCTCN2019092116-appb-000006
将上述反应体系置于PCR仪上,37℃的10min,65度10min。反应完后用1.0X AMPure磁珠进行纯化,最后将纯化产物溶于20μl洗脱缓冲液。
2、亚硫酸盐处理
采用试剂盒EZ DNA Methylation-Gold Kit TM(ZYMO),将上述连接好的DNA进行重亚硫酸盐共处理,具体步骤如下:
1)制备CT转换试剂(CT Conversion Reagent)溶液:从试剂盒中取出CT转换试剂(固体混合物),分别加入900μL的水、50μL的M-溶解缓冲液(M-Dissolving Buffer)和300μL 的M-稀释缓冲液(M-Dilution Buffer),室温下溶解并且震荡10分钟或在摇床上摇动10分钟。
M-洗涤缓冲液的制备:向M-洗涤缓冲液中添加24mL 100%的乙醇,备用。
2)在PCR管中加入130μL的CT转换试剂溶液和上述连接好的DNA,轻弹或移液器吹悬混合样品。
然后将样品管放到PCR仪上按以下步骤操作:
98℃下持续5分钟
64℃下持续2.5小时
完成上述操作后,立刻进行下一步操作或者在4℃下存储(最多20小时)备用。
3)将Zymo-Spin IC TMColumn放入收集管(Collection Tube)中,并加入600μL的M-结合缓冲液(M-Binding Buffer)。
将重亚硫酸盐处理的样品加入到含M-结合缓冲液的Zymo-Spin IC TMColumn中,盖上盖子颠倒混匀。
全速(>10,000x g)离心30秒,弃收集管中的收集液。
向柱中加入100μL的M-洗涤缓冲液,全速(>10,000x g)离心30秒,弃收集管中的液体。
向柱中添加200μL的M-Desulphonation Buffer,室温放置15min,全速(>10,000x g)离心30s,弃收集管中的液体。
向柱中添加200μL的M-洗涤缓冲液,全速(>10,000x g)离心30s,弃收集管中的液体,并再重复此步骤1次。
将Zymo-Spin IC TMColumn置于新的1.5mL EP管中,加入20μL的M-洗脱缓冲液r到柱基质中,室温放置2min,全速(>10,000x g)离心洗脱目的片段DNA。
3、第一轮PCR扩增
将上一步得到的目的片段DNA按以下体系配制PCR反应体系:
Figure PCTCN2019092116-appb-000007
PCR反应条件
Figure PCTCN2019092116-appb-000008
Figure PCTCN2019092116-appb-000009
反应完后用1.0X AMPure磁珠进行纯化,最后将纯化产物溶于22μl洗脱缓冲液。
4、第二轮PCR扩增
将上一步得到的目的片段DNA按以下体系配制PCR反应体系:
Figure PCTCN2019092116-appb-000010
通用引物3(SEQ ID NO:5):/5Phos/GAACGACATGGCTACGATCCGACTT
通用引物4(SEQ ID NO:6):
Figure PCTCN2019092116-appb-000011
其中,N碱基为分子标签
PCR反应条件如下:
Figure PCTCN2019092116-appb-000012
反应完后用1.0×AMPure磁珠进行纯化,最后将纯化产物溶于22μl洗脱缓冲液。
5、文库检测:
使用Bioanalyzer分析系统(Agilent,Santa Clara,USA)检测文库插入片段的大小及含量;
由此,构建的样本的基因组特定区域的高通量测序文库经检测。
6、上机测序
将得到的文库进行高通量测序,测序平台BGIseq-500,测序类型PE100,测序后数据经过比对后统计各项基本参数,包括下机数据、可用数据、比对率、GC含量等。
其结果如表2所示,各扩增子的深度如图7所示。
表2各测序数据
Figure PCTCN2019092116-appb-000013
从表2可以看出,本发明方法有良好的比对率和上靶率。从图7可以看出,各扩增子的深度均一性良好。
表3各引物序列
Figure PCTCN2019092116-appb-000014
Figure PCTCN2019092116-appb-000015
特异性引物池由上述引物等摩尔混合,终浓度为10μM。
实施例3:外显子甲基化区域捕获测试
取10ng炎黄细胞系gDNA,然后先制备半保留DNA甲基化信息和半保留DNA序列信息的文库,然后用MGI外显子捕获试剂盒进行杂交捕获(华大智造MGIeasy外显子组捕获V4探针试剂货号1000007745),捕获得到的文库到MGIseq-2000测序仪上进行上机测序,测序类型PE100,测序深度100X,然后进行数据分析,包括数据利用率、比对率、GC偏好性等性能。其实验过程如下:
1、打断末端修复加A
采用NEB公司的DnaseI和BST对产物进行末端修复和加A反应,反应体系和条件如下
Figure PCTCN2019092116-appb-000016
Figure PCTCN2019092116-appb-000017
将上述反应体系置于PCR仪上,37℃的10min,65度10min。反应完后用1.0X AMPure磁珠进行纯化,最后将纯化产物溶于20μl洗脱缓冲液。
2、连接甲基化接头:
1)将上一步得到的DNA按下表配制甲基化接头(也称“甲基化测序接头”)的连接反应体系:
Figure PCTCN2019092116-appb-000018
*甲基化接头序列和实施例1相同,即如SEQ ID NO:1和SEQ ID NO:2所示。
3)将上述反应体系置于20℃的Thermomixer(Eppendorf)上,进行反应15min,获得连接产物。反应完后用1.0X AMPure磁珠进行纯化,最后将纯化产物溶于22μl洗脱缓冲液。
3、亚硫酸盐处理
采用EZ DNA Methylation-Gold Kit TM(ZYMO),将上述连接好的DNA进行重亚硫酸盐共处理,具体步骤如下:
1)制备CT转换试剂(CT Conversion Reagent)溶液:从试剂盒中取出CT转换试剂(固体混合物),分别加入900μL的水、50μL的M-溶解缓冲液(M-Dissolving Buffer)和300μL的M-稀释缓冲液(M-Dilution Buffer),室温下溶解并且震荡10分钟或在摇床上摇动10分钟。
M-洗涤缓冲液的制备:向M-洗涤缓冲液中添加24mL 100%的乙醇,备用。
2)在PCR管中加入130μL的CT转换试剂溶液和上述连接好的DNA,轻弹或移液器吹悬混合样品。
然后将样品管放到PCR仪上按以下步骤操作:
98℃下持续5分钟
64℃下持续2.5小时
完成上述操作后,立刻进行下一步操作或者在4℃下存储(最多20小时)备用。
3)将Zymo-Spin IC TMColumn放入收集管(Collection Tube)中,并加入600μL的M-结合缓冲液(M-Binding Buffer)。
将重亚硫酸盐处理的样品加入到含M-结合缓冲液的Zymo-Spin IC TMColumn中,盖上盖子颠倒混匀。
全速(>10,000x g)离心30秒,弃收集管中的收集液。
向柱中加入100μL的M-洗涤缓冲液,全速(>10,000x g)离心30秒,弃收集管中的液体。
向柱中添加200μL的M-Desulphonation Buffer,室温放置15min,全速(>10,000x g)离心30s,弃收集管中的液体。
向柱中添加200μL的M-洗涤缓冲液,全速(>10,000x g)离心30s,弃收集管中的液体,并再重复此步骤1次。
将Zymo-Spin IC TMColumn置于新的1.5mL EP管中,加入20μL的M-洗脱缓冲液r到柱基质中,室温放置2min,全速(>10,000x g)离心洗脱目的片段DNA。
4、PCR扩增
将上一步得到的目的片段DNA按以下体系配制PCR反应体系,扩增酶体系来自kapa公司KAPA HiFi HotStart Uracil+ReadyMix(2X),货号为kk2801。
Figure PCTCN2019092116-appb-000019
通用引物1和通用引物2序列与实施例1相同,即如SEQ ID NO:3和SEQ ID NO:4所示。
PCR反应条件
Figure PCTCN2019092116-appb-000020
反应完后用AMPure磁珠进行纯化,最后将纯化产物溶于22μl洗脱缓冲液。
5、杂交
1)根据PCR产物浓度取1000ng PCR产物。若需要多样本混合杂交,每个样本至少投入250ng,且1000ng≤PCR产物投入总量≤2000ng,。
配制Block混合液(见表4):
表4 Block混合液的配制
组分 单个反应体积
Block 1 2.5μL
Block 2 2.5μL
Block 3 1μL
Block 4 10μL
Total 16μL
2)用移液器吸取16μL配制好的Block混合液加入到样本中配制成预杂交混合液,置于浓缩仪中浓缩至9μL。若体积小于9μL,则用NF water补至9μL。
3)9μL预杂交混合液置于PCR仪上,按照表5反应条件进行预杂交:
表5预杂交反应条件
温度 时间
热盖 On
95℃ 5min
65℃ Hold
6、杂交捕获
1)在一个新的0.2mL PCR管中配制杂交混合液(见表6):
表6杂交混合液的配制
Figure PCTCN2019092116-appb-000021
2)杂交混合液置于PCR仪中65℃孵育至少5min,透过光源观察,确认体系中没有晶体沉淀才能使用。
3)取一个新的96孔PCR板(推荐),在冰上配制探针混合液(见表7):
表7探针混合液的配制
组分 体积
NF water 1.5μL
Block 5 0.5μL
MGI Exome V4Probe 5μL
Total 7μL
4)将探针混合液置于PCR仪上,按照表8反应条件进行孵育:
表8探针混合液的孵育
温度 时间
热盖 On
65℃ 2min
65℃ Hold
5)保持上述各混合液于65℃,迅速吸取13μL杂交混合液转移到9μL预杂交混合液中,用移液器吹打混匀。
6)保持各混合于65℃,迅速将上一步22μL部转移到的探针混合液中,用移液器吹打混匀。
7)用高透光度粘性盖膜迅速封好PCR板,压紧封膜,确保所有孔完全密封,并重复该步骤一次(即封膜两次)。
8)保持96孔PCR板于65℃,按照表9反应条件进行杂交反应24h。
表9杂交反应条件
温度 时间
热盖(105℃) On
65℃ Hold
7、洗脱前准备
1)提前至少30min将Thermomixer调至65℃,取1.8mL Wash Buffer II于2.0mL离心管中,置于Thermomixer中预热至65℃。
2)取出M-280磁珠充分震荡混匀,用移液器吸取50μL M-280磁珠至新的2.0mL离心管中。
3)加入200μL Binding Buffer,涡旋震荡5s至所有磁珠悬浮。
4)将离心管瞬时离心,置于磁力架,静置2-5min至液体澄清,用移液器小心吸取上清并丢弃。
5)重复步骤以上步骤两次。
6)加入200μL Binding Buffer重悬磁珠。
8、洗脱
1)杂交反应液经24h孵育后,继续保持在PCR仪上65℃,用刀片划开封口膜,使用移液器快速吸取并估计剩余杂交液体积,然后转移到上一步含有200μL磁珠的离心管中。
2)将离心管置于Nutator或类似的装置上360°旋转混匀,室温下旋转孵育30min。
3)将样本从Nutator中取下。
4)将离心管瞬时离心,置于磁力架,静置2-5min至液体澄清,用移液器小心吸取上清并丢弃。
5)加入500μL Wash Buffer I,上下颠倒至所有磁珠悬浮,室温下孵育15min。
6)将离心管瞬时离心,置于磁力架,静置2-5min至液体澄清,用移液器小心吸取上清并丢弃。
7)在离心管中加入500μL预热的Wash Buffer II,置于Thermomixer中将转速调至1000rpm震荡10s,使得所有磁珠悬浮后将转速调至0rpm,温度调至65℃静置孵育10min。
8)将离心管瞬时离心,置于磁力架,静置30s至液体澄清,用移液器小心吸取上清并丢弃。
9)重复步骤7-8两次。
10)用100μL NF water重悬磁珠,将全部重悬后的样品(包括磁珠)全部转移到新的1.5mL离心管中,瞬时离心。
11)1.5mL离心管置于磁力架上,静置2min至液体完全澄清,小心吸取上清并丢弃,可用小量程的移液器重复吸取以尽量保证无液体残留。
12)用44μL NF water重悬磁珠,用移液器吸取全部重悬后样品(包括磁珠)转移到新的PCR管中。
9、杂交后PCR
1)在冰上配制杂交后PCR反应液(见表10):
表10杂交后PCR反应液的配制
组分 单个反应体积
Post-PCR Enzyme Mix 50μL
PCR Primer Mix 6μL
Total 56μL
2)用移液器吸取56μL配制好的PCR反应液加入带有磁珠的PCR管中,涡旋震荡3次,每次3s,瞬时离心将反应液收集至管底。
3)所述PCR管置于PCR仪上,按照表11的条件进行杂交后PCR:
表11杂交后PCR反应条件
Figure PCTCN2019092116-appb-000022
Figure PCTCN2019092116-appb-000023
9、文库检测:
使用Bioanalyzer分析系统(Agilent,Santa Clara,USA)检测文库插入片段的大小及含量;
由此,构建的样本的基因组特定区域的高通量测序文库经检测。
10、上机测序
将得到的文库进行高通量测序,测序平台MGIseq-2000,测序类型PE100,测序后数据经过比对后统计各项基本参数,包括下机数据、比对数据、目标区域比例等
11、结果
采用本发明实施例方法得到的基本参数统计,表12;
采用本发明实施例方法得到的目标位点甲基化率和焦磷酸得到的甲基化率对比,如图9所示。
表12
Figure PCTCN2019092116-appb-000024
其中表12中比对率指的是比对到基因组上的比例,重复率指的是测到同一个位置的read的比例,捕获率指的是比对到目标区域read和总read的比例,平均深度指的是测序覆盖到目标区域的平均深度,20x覆盖度指的是目标区域覆盖到20X或者以上深度的区域占整个区域的比例。
测序数据下机后根据DNA序列和甲基化序列进行比对,得到的比对后的数据(87.8%)再去统计落在外显子区域和侧翼区域的数据(49.5%),并统计目标区域平均深度(99.3X)和20X覆盖度(95.2%),可以看到该方法可以有效进行甲基化捕获。
在本发明的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的 特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (17)

  1. 一种基于DNA样本构建测序文库的方法,其特征在于,包括:
    基于所述DNA样本,利用内切酶酶切,以便获得带有单链切口的DNA样本;
    基于所述带有单链切口的DNA样本,利用聚合酶、dATP、dTTP、dGTP和甲基化修饰的dCTP进行聚合反应,以便获得混合DNA,所述混合DNA包括反向互补的两条链,其中每一条链的5’端为所述DNA样本的原始序列,每一条链的3’端为合成序列,所述每一条链的3’端上碱基C是经甲基化修饰的;
    基于所述混合DNA,进行亚硫酸氢盐处理,以便获得经转化后的混合DNA;
    基于所述经转化的混合DNA,扩增获得测序文库。
  2. 根据权利要求1所述的方法,其特征在于,所述内切酶为DNaseI、DNaseII或者能够产生所述单链切口的任意内切酶。
  3. 根据权利要求1所述的方法,其特征在于,所述带有单链切口的DNA样本的长度为100~1000bp。
  4. 根据权利要求1所述的方法,其特征在于,进一步包括:
    基于所述混合DNA,连接甲基化测序接头,进行亚硫酸氢盐处理,以便获得经转化后的混合DNA,所述甲基化测序接头中含有第一通用序列和第二通用序列;
    基于所述经转化的混合DNA,利用通用引物进行扩增,以便获得测序文库,所述通用引物和所述第一通用序列和所述第二通用序列匹配。
  5. 根据权利要求1所述的方法,其特征在于,所述DNA样本为全基因组DNA样本。
  6. 根据权利要求1所述的方法,其特征在于,进一步包括:
    基于所述经转化的混合DNA,利用特异性引物进行所述扩增,以便获得基于所述DNA样本的目标区域的测序文库,所述特异性引物包括第一特异性引物和第二特异性引物,所述第一特异性引物位于所述经转化的混合DNA的5’端,所述第二特异性引物位于所述经转化的混合DNA的3’端。
  7. 根据权利要求1所述的方法,其特征在于,进一步包括:
    基于所述经转化的混合DNA,利用探针进行杂交捕获,洗脱,以便获得捕获产物,所述探针用于杂交所述经转化的混合DNA的3’端序列;
    基于所述捕获产物,扩增获得测序文库。
  8. 一种对DNA样本进行测序的方法,其特征在于,包括:
    基于所述DNA样本,利用权利要求1~7中任一项所述的方法获得测序文库;
    基于所述测序文库,测序获得所述DNA样本的测序结果。
  9. 根据权利要求8所述的方法,其特征在于,所述测序为双端测序或者单端测序。
  10. 一种确定DNA样本的甲基化状态的方法,其特征在于,包括:
    基于所述DNA样本,利用权利要求1~7中任一项所述的方法获得所述测序文库;
    基于所述测序文库,测序获得所述DNA样本的测序结果;
    将所述DNA样本的5’端和3’端的测序结果分别与参考基因组进行比对,以便确定所述5’端和所述3’端的位置信息;
    基于所述5’端和所述3’端的位置信息,比较分析所述所述DNA样本的位置,以便确定所述DNA样本的甲基化状态。
  11. 根据权利要求10所述的方法,其特征在于,进一步包括:
    若所述3’端对应多个候选位置,所述5’端对应一个候选位置,且所述5’端对应候选位置的附近位置属于所述3端的多个候选位置的一个,则确定所述5’端的对应位置为准;
    若所述3’端对应多个候选位置,所述5’端对应多个候选位置,则确定所述5’端和所述3’端的共同的最优候选位置为准;
    若所述3端对应一个候选位置,所述5’端对应多个候选位置,且所述3端对应候选位置的附近位置属于所述5端的多个候选位置的一个,则确定所述3端的对应位置为准;
    若所述3’端对应一个候选位置,所述5’端对应一个候选位置,且所述3’端对应候选位置的附近位置属于所述5’端的候选位置的附近,则确定所述3’端或5’端的对应位置为准;
    其余以3’端比对的位置为主要比对位置。
  12. 根据权利要求10所述的方法,其特征在于,采用BWA软件将所述3’端与所述参考基因组比对,采用BS-map软件将所述5’端与所述参考基因组比对。
  13. 一种试剂盒,其特征在于,包括:核酸内切酶、核酸扩增试剂、甲基化修饰的dCTP、和甲基化检测试剂。
  14. 根据权利要求13所述试剂盒,其特征在于,进一步包括:第一特异性引物和第二特异性引物,所述第一特异性引物包括SEQ ID NO:7~SEQ ID NO:16,所述第二特异性引物包括SEQ ID NO:17~SEQ ID NO:26。
  15. 根据权利要求13所述试剂盒,其特征在于,进一步包括:探针,所述探针用于捕获目标序列,构建目标区域核酸文库。
  16. 一种双链DNA,其特征在于,所述双链DNA包括反向互补的两条链,其中每一条链包括5’端序列和3’端序列,每一条链的3’端上的碱基C均是经甲基化修饰的。
  17. 根据权利要求16所述的双链DNA,其特征在于,所述双链DNA的长度为100~1000bp。
PCT/CN2019/092116 2019-06-20 2019-06-20 基于dna样本构建测序文库的方法及应用 WO2020252749A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201980092842.5A CN113544282B (zh) 2019-06-20 2019-06-20 基于dna样本构建测序文库的方法及应用
EP19933969.8A EP3988665B1 (en) 2019-06-20 2019-06-20 Method and use for construction of sequencing library based on dna samples
PCT/CN2019/092116 WO2020252749A1 (zh) 2019-06-20 2019-06-20 基于dna样本构建测序文库的方法及应用
US17/545,724 US20220090059A1 (en) 2019-06-20 2021-12-08 Method and use for construction of sequencing library based on dna samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/092116 WO2020252749A1 (zh) 2019-06-20 2019-06-20 基于dna样本构建测序文库的方法及应用

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/545,724 Continuation US20220090059A1 (en) 2019-06-20 2021-12-08 Method and use for construction of sequencing library based on dna samples

Publications (1)

Publication Number Publication Date
WO2020252749A1 true WO2020252749A1 (zh) 2020-12-24

Family

ID=74037596

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/092116 WO2020252749A1 (zh) 2019-06-20 2019-06-20 基于dna样本构建测序文库的方法及应用

Country Status (4)

Country Link
US (1) US20220090059A1 (zh)
EP (1) EP3988665B1 (zh)
CN (1) CN113544282B (zh)
WO (1) WO2020252749A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114480576B (zh) * 2022-01-26 2023-04-07 纳昂达(南京)生物科技有限公司 靶向甲基化测序文库的构建方法及试剂盒

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109554451A (zh) * 2018-09-12 2019-04-02 上海奕谱生物科技有限公司 一种同时进行基因组dna多态性和甲基化检测的方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2443236B1 (en) * 2009-06-15 2015-05-13 Complete Genomics, Inc. Methods and compositions for long fragment read sequencing
US20150011396A1 (en) * 2012-07-09 2015-01-08 Benjamin G. Schroeder Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing
CN103806111A (zh) * 2012-11-15 2014-05-21 深圳华大基因科技有限公司 高通量测序文库的构建方法及其应用
CN106497920A (zh) * 2016-11-21 2017-03-15 深圳华大基因研究院 一种用于非小细胞肺癌基因突变检测的文库构建方法及试剂盒
EP3710596B1 (en) * 2017-11-16 2023-08-02 New England Biolabs, Inc. Mapping the location, type and strand of damaged nucleotides in double-stranded dna

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109554451A (zh) * 2018-09-12 2019-04-02 上海奕谱生物科技有限公司 一种同时进行基因组dna多态性和甲基化检测的方法

Also Published As

Publication number Publication date
EP3988665B1 (en) 2023-08-30
EP3988665A4 (en) 2022-06-29
US20220090059A1 (en) 2022-03-24
EP3988665A1 (en) 2022-04-27
CN113544282A (zh) 2021-10-22
CN113544282B (zh) 2024-05-14

Similar Documents

Publication Publication Date Title
CA2990846C (en) Selective degradation of wild-type dna and enrichment of mutant alleles using nuclease
DK1991698T3 (en) "High-throughput" -sekvensbaseret detection of SNPs using ligeringsassays
US20200048697A1 (en) Compositions and methods for detection of genomic variance and DNA methylation status
US10329605B2 (en) Method to increase sensitivity of detection of low-occurrence mutations
US20090269771A1 (en) Method of sequencing and mapping target nucleic acids
WO2014101655A1 (zh) 一种高通量核酸分析方法及其应用
JP2010535513A (ja) 高スループット亜硫酸水素dnaシークエンシングのための方法および組成物ならびに有用性
US20220056519A1 (en) Method and system for constructing sequencing library on the basis of methylated dna target region, and use thereof
AU2017339984A1 (en) Method for multiplex detection of methylated DNA
EP2681332A1 (en) Kit and method for sequencing a target dna in a mixed population
JPWO2007083766A1 (ja) 分子内プローブによる核酸配列の検出方法
CN115011672A (zh) 一种超低频基因突变检测方法
CN114555831A (zh) 制备双索引甲基化序列文库的方法
CN107236727B (zh) 多基因捕获测序的单链探针制备方法
US20220090059A1 (en) Method and use for construction of sequencing library based on dna samples
US20180187267A1 (en) Method for conducting early detection of colon cancer and/or of colon cancer precursor cells and for monitoring colon cancer recurrence
CA2901120C (en) Methods and kits for identifying and adjusting for bias in sequencing of polynucleotide samples
WO2019062614A1 (en) METHOD FOR AMPLIFYING TARGET NUCLEIC ACID
AU2015336938B2 (en) Genome methylation analysis
CN114929896A (zh) 用于多重靶扩增pcr的有效方法和组合物
JP2022546485A (ja) 腫瘍高精度アッセイのための組成物および方法
WO2023141829A1 (zh) 同时进行全基因组dna测序和全基因组dna甲基化或/和羟甲基化测序的方法
WO2024124400A1 (zh) 一种基于多重pcr的靶向甲基化建库体系、方法及其应用
CN116288742A (zh) 一种dna分子的建库方法
WO2010008809A2 (en) Compositions and methods for early stage sex determination

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933969

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2019933969

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2019933969

Country of ref document: EP

Effective date: 20220120