WO2013071876A1 - 高通量测序文库的构建方法及其应用 - Google Patents

高通量测序文库的构建方法及其应用 Download PDF

Info

Publication number
WO2013071876A1
WO2013071876A1 PCT/CN2012/084691 CN2012084691W WO2013071876A1 WO 2013071876 A1 WO2013071876 A1 WO 2013071876A1 CN 2012084691 W CN2012084691 W CN 2012084691W WO 2013071876 A1 WO2013071876 A1 WO 2013071876A1
Authority
WO
WIPO (PCT)
Prior art keywords
region
fragment
specific
dna
methylation
Prior art date
Application number
PCT/CN2012/084691
Other languages
English (en)
French (fr)
Inventor
高飞
王君文
王童
蒋慧
武靖华
吴红龙
Original Assignee
深圳华大基因科技有限公司
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大基因研究院 filed Critical 深圳华大基因科技有限公司
Priority to US14/358,674 priority Critical patent/US9920363B2/en
Publication of WO2013071876A1 publication Critical patent/WO2013071876A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • C40B40/08Libraries containing RNA or DNA which encodes proteins, e.g. gene libraries
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Definitions

  • the present invention relates to the field of biotechnology, in particular, to methylation detection techniques, particularly to methylation detection involving specific regions of a genome. More specifically, the present invention provides a method for constructing a high-throughput sequencing library, a method for determining a sample Method for methylation information of a specific region of the genome, a device for determining methylation information of a specific region of the genome of the abundance, and a high-throughput sequencing of the genome-specific region for constructing a copy box. Background technique
  • DNA methylation maintains normal cell function and inhibits the damage of genomic integrity of parasitic DNA components, chromatin structure modification, X chromosome inactivation, genomic imprinting, embryo development. And it plays an important role in the occurrence of human tumors. It is one of the new research hotspots.
  • the present invention is directed to solving at least one of the problems of the prior art.
  • the present invention provides a method for constructing a high-throughput sequencing library and an application thereof
  • the invention provides a method of constructing a high throughput sequencing transcript:
  • the method comprises the steps of: fragmenting genomic 0 A to obtain a DMA fragment;
  • the DNA fragment is end-repaired to obtain a DNA fragment which has been repaired at the end;
  • a base A is added to the 3' end of the end-repaired DNA fragment to obtain a DNA fragment having a sticky terminal A;
  • the DNA segment of A is linked to a methylation linker to obtain a ligation product; the hybridization capture of the * ligation product by a specific probe is performed to obtain a target fragment; and the target fragment is subjected to PCR amplification to obtain a chirp product.
  • a specific probe that can be used is specific for a known methylation site, for example,
  • the specific 3 ⁇ 4 probe is based on the use of the human genome as a reference sequence and is designed using a gene region known to have a methylation site as a target sequence.
  • a gene is known methylation sites in Table 1 may be: the coding region and promoter region of at least one family of genes listed Table! 4 molecular function-related genes
  • HDAC' HDACi i HDAtM HDAC8 HDAO HDACH
  • KUVBU I ReliefBFI ARRBI MOR 4:SJ RUVBU D AP3: EPCi EPSOO TR5MI6 ⁇ iRR TRRAP TCF3
  • a high-throughput sequencing library of DNA samples which is a high-throughput sequencing library capable of efficiently constructing specific regions of known methylation sites of genomic DNA, can be effectively and fully applied to high-resolution sequencing
  • the technology can efficiently obtain the methylation site information of a specific region of the genome, and realize the methylation detection of the specific region of the genome of the gene New DNA sample.
  • the present invention provides a method for determining methylation information of a specific region of a genome of a sample.ong
  • the method includes the following steps: A method for sequencing a library by flux, constructing a ⁇ flux sequencing library of a specific region of the genome of the genus; High-throughput sequencing libraries of the E domain are sequenced for sequencing scars; and data analysis of the sequencing is performed to determine methylation information for a particular region of the genome of the sample
  • the method 5 for determining the methylation information of a specific region of a genome can accurately determine the methylation information of a specific region of the base group of the present embodiment, thereby realizing methylation of a specific region of the genome of the sample. Detection,:
  • the present invention provides an apparatus for determining methylation information of a specific region of a genome of a sylvestre according to the present invention.
  • the apparatus comprises: a preparation unit for preparing a gene of a sample, a high-throughput sequencing library of a specific region, wherein the library preparation unit is provided with a specific surname probe; a sequencing unit, the sequencing unit is connected to the read library preparation unit, and the preparation unit is prepared from the library Receiving a high-throughput sequencing library of the genomic-specific region of the genus to sequence the 3 ⁇ 4 flux sequencing library of the genomic-specific region of the ⁇ sample to obtain sequencing results; and the data analysis unit The unit is connected, and the sequencing result is received from the sequencing unit, so that the sequencing result is analyzed by data, and the methylation information of the specific region of the gene of the genus is determined.
  • the apparatus for determining methylation information of a specific region of a genome of a sample can conveniently and accurately determine methylation information of a specific region of a genome of a specimen, and can be applied to a plurality of specific regions of the genome.
  • the invention provides a kit for constructing a high-throughput sequencing library of a genomic-specific S domain of a syllabus according to an embodiment of the invention, the kit comprising: Needle, the specific surname probe is a test kit specific for a known methylation site, using a high-throughput sequencing library for constructing a genomic specific region of a sample according to the present invention, which is convenient and effective High-throughput sequencing library for specific regions of the group
  • Figure 1 shows a schematic flow diagram S of a method for constructing a high-throughput sequencing library according to one embodiment of the present invention
  • Figure 2 Strictly illustrating a method for determining a specific region of a group S group according to an embodiment of the present invention When information is used, at different depths of coverage (cover depth > 1 and coverage depth > 5), the capture area on each chromatin accounts for the percentage S of the probe target area;
  • (c) shows the original distribution of the specific region of the base S group of Xiangben and the gene of the determined copy according to an example of the present invention, the reads distribution of the high-throughput sequencing library of the specific region of the group and the promoter, CpG island Regional city methylation level distribution S;
  • FIG. 5 shows a schematic S of an apparatus for determining methylation information of a particular region of a genome of a syllabary according to an embodiment of the present invention
  • 3 ⁇ 4 6 shows the insertion fragment length distribution of a sequencing read (also referred to herein as "reads") in accordance with one embodiment of the present invention
  • Figure 7 shows the sequencing depth and cumulative sequencing depth of each base of the capture region in accordance with one embodiment of the present invention
  • S 8 shows a representation S of the frequency of occurrence of base mismatch in each position during the read comparison process according to an embodiment of the present invention
  • Circumference 10 shows an intuitive, individualized capture of the individual capture bases in accordance with one embodiment of the present invention.
  • the invention provides a method of constructing a 3 ⁇ 4 quantity sequencing library.
  • a silver according to an embodiment of the invention the method comprises the following steps:
  • DNA is fragmented to obtain a DNA fragment.
  • DNA as used in the present invention may be any polymer comprising degassed ribonucleotides including, but not limited to, modified or unmodified DN A. It will be understood by those skilled in the art that the source of the genomic DNA is not particularly limited and can be obtained from any possible route, and can be obtained directly from a commercial market, or directly obtained from other laboratories, or directly from the other.
  • the present solid extract ⁇ jerk embodiment of the present invention can be obtained by extraction from the sample-yl! 3 ⁇ 4 group 0!
  • the method of constructing a high-throughput reordered library may further comprise the step of extracting genomic DNA from a syllabus according to some specific examples of the present invention, which may be derived from mammals, plants And at least one of the clotting organisms.
  • the mammal can be at least one of human and mouse according to an embodiment of the invention
  • the genome: DNA can be a human whole blood genome: DNA, superior to peripheral blood mononuclear cell base. The inventors found that when constructing a high-throughput 3 ⁇ 4 sequence library using YH ceil-based DNA, the operator of the genome was extracted from the sample.
  • the amount of the genomic DNA is not particularly limited, and according to the specific embodiment of the present invention, Escape genome: the amount of DA is 2 g.
  • the inventors have surprisingly found that when the amount of genomic DNA is 2 g, the gene of the genus of the embodiment of the present invention is constructed by constructing a high-throughput sequencing library, and the group-specific region City's high-throughput sequencing library can be easily applied to high-throughput sequencing technologies, such as Sotexa sequencing technology, and the sequencing results are accurate, reproducible, and contain specific regions of methylation information. High methylation site coverage
  • the DNA fragments were end-repaired, in order to obtain through DMA fragment ends repaired ⁇ according to an embodiment of the present invention, before the DNA fragments bundle end repair, can enter a ho step comprises purifying DMA fragments, thereby, such Subsequent end-repair is readily performed in accordance with the practice of the present invention.
  • the end-repair of the DN A fragment can be carried out using the Kkmow fragment, T4: DNA polymerase and T4 polynucleotide kinase, wherein the Kienow fragment has 5, - 3, polymerase live and 3' ⁇ 5, polymerase activity, but lacking 5' ⁇ 3' exonuclease activity, thereby enabling convenient and accurate end-repair of the ⁇ ) fragment according to an embodiment of the invention, It may further comprise a step of purifying the end-repaired DMA fragment, thereby facilitating subsequent processing: subsequent processing
  • base A is added to the 3' end of the end-repaired DMA fragment to obtain a DNA fragment having a sticky end A.
  • Klerrow (3'-5, exo-) can be utilized. , that is, Kkmow with 3, ⁇ 5' excision activity, adding bases at the 3' bundle end of the end-repaired DNA fragment, thereby being able to conveniently and accurately add the minus-base A to the DNA repaired at the end of the town.
  • the scorpion end of the fragment may further comprise a step f of purifying the DNA fragment having the cohesive end A according to an embodiment of the present invention, thereby enabling convenient subsequent processing.
  • the DNA fragment having the sticky end A is The methylation linker is ligated to obtain the ligation product.
  • methylated linker 5 ' refers to a linker in which all C sites are methylated in its nucleotide sequence. Modification of a DNA fragment according to the present invention, which may further comprise a step of methylating a linker used in conventional sequencing, before linking the DNA fragment having a sticky ' ⁇ end A to a methylation linker Avoid sequencing pair Continued heavy sulfite treatment and other operations; for example, the linker sequence may be altered during the heavy sulfite treatment process. It will be understood by those skilled in the art that the method of methylating the linker is not particularly Limitation, sequencing of the linker can be methylated using any method known in the art
  • the methylation linker can further be tagged, thereby facilitating simultaneous construction of high-throughput sequencing libraries of genomic pending regions of a plurality of samples, and can be effectively applied to high-throughput libraries.
  • the sequencing platform based on the data analysis of the sequencing results, can accurately distinguish the genes of a plurality of samples based on the sequence information of the tags, the sequence information of the high-throughput sequenced transcripts of the specific region of the group, and the specific region of the genome of the sample.
  • Basic information which enables full use of high-throughput sequencing platforms, saving time and reducing costs
  • the length of the label is 6bj>
  • different ligands are connected to each other during the preparation of the library, and a plurality of different libraries are mixed together to form a new library before capture, and the new library is used for probe capture.
  • sequencing and sequencing the data is a mixture of multiple samples, according to each read (sequence read by the sequencer) Separate different tag sequences in different ways. This method can greatly reduce cost, time, and human according to embodiments of the present invention.
  • the attachment of the UNA fragment having the sticky terminal A to the methylation linker is carried out by using the T4 DNA linkage, whereby the ligation product can be conveniently obtained according to the embodiment of the present invention, and Further comprising the step of purifying the ligation product, which facilitates subsequent processing,
  • the ligation product is subjected to hybridization capture using a specific probe to obtain a target fragment according to the present invention
  • the term "specific probe” herein means that the probe is a known methylation site.
  • the specific probe is based on the use of a human base set as a reference sequence, and is designed using a specific base S region of the base group having a methylation site as a purine sequence.
  • a gene region having a methylation site includes at least one of a promoter region, a CpG island region, a CpG extra-island region, and an imprinted gene region, thereby residing according to the present invention.
  • the specific animal probe is hybridized and captured, and can effectively capture the sequence complementary to the sputum sequence in the sample, that is, the region of the gene known to have a methylation site in Xiangben (in this specification, sometimes referred to as "genome" Specific area” ⁇
  • a base region of a known methylation site that can be used to design a specific probe is a coding region and a promoter region of at least one of the genes listed in Table.
  • the coding region is an exon region sequence
  • the promoter region is a region from the upstream of the base S3 transcription initiation site of 2200 bp to the downstream 50 bp region according to an embodiment of the present invention.
  • the specific probe is designed by the eAxray system, and the silver is according to the invention.
  • the length of the probe is i 2mer.
  • a base S TSS transcription initiation site upstream 2200 bp
  • Downstream 500 bp as a promoter (promoter) region
  • excm (exon) domain sequences are transmitted as coding regions of genes, and the design of capture probes for sequence information of these regions.
  • these bases are utilized. Sequencing scoring from the screening of the locus target, there is no deviation in the coverage of each chromosome
  • the single-stranded capture probe can be complementarily combined with the single-stranded target sequence, thereby successfully capturing the 11-label region according to an embodiment of the present invention, and the probe design can select the solid phase.
  • the capture chip (the probe is immobilized on the steroid carrier) or the liquid phase capture probe (the probe is free in the liquid), however, the phase capture chip is made up of many factors such as probe length, probe density, and high price. Capture is the preferred 3 ⁇ 4
  • a probe design system eA y of AgiieM is used, and the probe length is 120i r, and the probe can cover a wide range of lengths, from less than 200kb to 24Mb or even longer, eAn'
  • the ay probe system can easily use the bioinformatics tool window masker (window "sequence masking") and repeat masker (repetitive sequence masking) to analyze and mask the marking area, thus avoiding # ⁇
  • the design is very effective in reducing the capture interference in the experiment and the comparison interference that occurs when the subsequent sequence is branched; and shortening the coverage length can reduce the cost to a certain extent:
  • a high CG (higher CG base content of more than 60%) sequence in a nucleic acid results in a lower capture efficiency than a conventional (ATCG base average content of 25%) sequence due to differences in the molecular structures of the bases C and G.
  • the object CGI (CpG island, CpG island) area can be improved by increasing the amount of probe design.
  • a single-stranded oligonucleotide pair such as c/DNA and a linker blocking sequence may be further used to ligate the ligation product (especially the repeat region in the genomic sequence of the ligation product) and The step of hybridization blocking the methylation linker on the ligation product: the inventors have surprisingly found that when d/DNA and linker blocking sequences are used, respectively, the ligation product (especially the repetitive region in the genomic sequence of the ligation product) and After hybridization blocking of the methylation linker on the ligation product, the hybridization capture root of the ligation product can be remarkably enhanced.
  • the amount of cj- ⁇ is not particularly limited, according to a specific example.
  • the excess region is used to hybridize the repeat region in the basal sequence of the ligated product.
  • the term "excess" as used herein means that the amount of c-! DNA is much larger than the amount of ligated product to be hybridized. That is, the amount of cj-l DA may be more than twice the amount of the contiguous product to be subjected to hybridization capture according to the specific embodiment of the present invention.
  • the amount of cj- ⁇ DNA is 5 times the amount of the ligation product to be hybridized, and in some embodiments of the present invention, the amount of cj-/DNA.
  • the amount of the C/DNA is large for the amount of the ligation product to be hybridized and captured. Times, too much c-l DNA will affect the binding of the probe to the ligation product, and the efficiency of hybridization with the H. sinensis will be 5 times the amount of cj DNA using the amount of the product to be hybridized.
  • Hybridization of the genomic region repeats of the ligated product can be conveniently and efficiently blocked to remove the repeat DNA, thereby effectively avoiding the non-specific 3 ⁇ 4 strong hybrid background signal generated by the repeat sequence during subsequent nucleic acid hybridization.
  • the interference of the nucleic acid hybridization enhances the hybridization effect.
  • the joint closure sequence according to the embodiment of the present invention includes the escape from B!ock l and Btock2. At least one, whereby the methylated linker on the ligation product can be effectively encapsulated in the embodiment of the present invention, and the hybridization capture can be carried out using the ligation product of ig, thereby improving hybridization capture.
  • the hybridization capture of the -linked product using a specific surname probe may further comprise capturing the target fragment with a streptozin magnetic bead, whereby the slice capable of being efficiently captured will then The target fragment is subjected to PCR amplification to obtain an amplification product.
  • the transformed sputum fragment can be subjected to PCR amplification using a hot-starting taq DNA polymerase according to an embodiment of the present invention, hot-starting taq DNA
  • a hot-starting taq DNA polymerase according to an embodiment of the present invention, hot-starting taq DNA
  • the kind of the polymerase is not particularly limited, and according to a specific example of the present invention, the hot-start iaq DNA polymerase may be an r-taq poly-synthase, whereby the PCR amplification efficiency is high and the time is small.
  • the amplified product is isolated and purified, and the obtained amplified product constitutes a full-base® group methylation high-throughput sequencing library.
  • the method for isolating and purifying the amplified product is not limited, according to the present invention.
  • high-throughput sequencing libraries have a library fragment length of 300-450 bp, thereby enabling high-throughput sequencing libraries to be easily and efficiently applied to high-throughput sequencing platforms such as Soiexa sequencing stations, with reproducible sequencing results.
  • the true and reliable methylation information of the specific group of the base group targeted by the specific probe is complete.
  • the target fragment after the obtained fragment, can be subjected to bisulfite treatment to convert the unmethylated cytosine of the ⁇ fragment into uracil, and the converted fragment is obtained.
  • the fragment of !3 may be further mixed with the fragmented ⁇ -DNA prior to the bisulfate treatment of the fragment.
  • fragmented ⁇ -DNAs can be prepared by any method known in the art, as can be prepared along with the previous DNA fragmentation treatment.
  • the heavy potrate treatment can be carried out by any method known in the art. Specific examples according to the present invention can be carried out using commercially available kits, preferably using EZ DNA Methylation-Gold KitTM (YMO), which the inventors have surprisingly discovered.
  • EZ DNA Methylaiion-Goid KitTM ZYMO
  • ZYMO EZ DNA Methylaiion-Goid KitTM
  • Example 5 can be directly sequenced after capture > and then > based on sequencing results, single nucleotide polymorphisms (SNPs), nucleotide mutations, insertions, deletions (iiidel) or copies of genes can be performed.
  • SNPs single nucleotide polymorphisms
  • CNVs number of changes
  • CNVs CNVs
  • DMRs differential methylation regions
  • the method for constructing a high-throughput sequencing library can efficiently construct a high-throughput sequencing library of a specific region of a base group of a sample, thereby being effectively and fully applied to high-throughput sequencing technology.
  • the methylation information of the specific region of the genome of the sample can be effectively obtained, and the methylation detection of the specific region of the genome of the sample is determined to determine the sample.
  • the invention provides a method of determining methylation information for a particular region of a genome of a sample.
  • the method comprises the following step: constructing a high-throughput sequence library of a high-throughput sequence library according to the method of the present invention; constructing a high-throughput sequencing library of the genomic region to be determined; The region's high-throughput sequencing library is sequenced to allow sequencing results; and the sequencing results are data-split to determine methylation information for specific regions of the base group
  • Sequencing according to some embodiments of the invention is performed using high throughput sequencing techniques, and those skilled in the art will appreciate that sequencing can be performed by any high throughput sequencing technique known in the art, according to the present invention. Specific examples, preferably only using H o OOO sequencing for sequencing.
  • the inventors have found that sequencing a high-spot sequencing library of a genome-specific region of a sample with an ffis3 ⁇ 4
  • the method for determining methylation information of a specific region of a genome of a specimen can efficiently construct a high-throughput sequencing library of a specific region of a base of a sample, and can be processed by high-throughput sequencing.
  • the Sokxa reordering technology enables accurate sequencing of the library. Based on the data analysis of the sequencing results, the methylation information of the specific region of the group can be accurately determined, thereby realizing the methyl group in the specific region of the genome of the genus Chemical testing, and the methylation sites in specific regions cover more, and the methylation information is complete,
  • the present invention provides an apparatus for determining methylation information of a specific region of a genome of a specimen.
  • the apparatus 1000 comprises: library preparation Unit 100, residual unit 200, and data analysis unit 3()0,
  • the transcript preparation unit 100 is used to prepare a high-throughput sequencing library of a genomic specific region of the genus, wherein the library preparation unit 00 is provided with a specific probe according to the specific example of the present invention.
  • the surname probe is a specific example according to the invention that is specific for a known methylation site based on the use of a human gene, a set as a reference sequence, and a genomically known methylation site.
  • the specific gene region is designed as a target sequence, and specifically, a gene region having a methylation site is known to include at least one gene from the promoter region, the CpG island region, the Cp ⁇ 3 extra-island region, and the imprinted gene region.
  • hybridization capture using a specific probe can efficiently capture a sequence in a transcript that is complementary to a target sequence, ie, a base region of a sample that is known to have a methylation site.
  • the library preparation unit 100 can be adapted to implement the high-throughput sequencing library construction method described above.
  • a gene having a methylation site known to be a specific probe can be used. The area is the table!
  • the specific probe is designed using an eArray system. According to an embodiment of the invention, optionally 5 12mer. About the probe, I have already described it in detail before.
  • the sequencing unit 200 is linked to the library preparation sheet 100, and a high-throughput sequencing library of the genome-specific regions of the prepared samples can be received from the library preparation unit 100, and the high-throughput sequence of the base group-specific domains of the received samples is received.
  • the library is sequenced and sequencing results can be obtained from 3 ⁇ 4
  • the data analysis unit 300 is connected to the residual sequence unit 200, and can receive the obtained sequencing result from the sequencing unit 200, and can further perform data analysis on the sequencing node, thereby determining methylation information of the specific region of the genome of the sample based on the analysis result. Finally, the detection of methylation in specific regions of the gene of the sample is achieved.
  • the apparatus for determining methylation information of a specific region of a fixed set of samples can conveniently and accurately determine methylation information of a specific region of a genome of a sample, thereby being applicable to various genome-specific Regions, such as genes known to be methylation sites, methylation studies of group regions, for example, can be used to detect methylation abnormalities in specific regions of the genome, Kit
  • the present invention provides a kit for constructing a high-throughput repetitive library of a genome-specific city of a sample according to an embodiment of the present invention, the kit comprising: a specific probe,
  • the specific probe is specific for a known methylation site, and according to some specific examples of the present invention, the specific probe is based on the use of the human genome as a reference sequence, and the methylation position is known to be used on the genome.
  • the specific base region of the dot is designed as a target sequence.
  • a gene region having a methylation site is known to include an escape from a promoter region, a CpG island region, a Cp (3 extra-island region, and an imprinted base region).
  • At least one, thereby utilizing a specific probe for hybridization capture is capable of efficiently capturing a sequence complementary to a target sequence in a transcript, ie, a sample having a methylation site in the sample.
  • the s-region can be used to design a coding region of a specific probe having a methylation site with a methylation site as at least one of the genes listed in the table according to an embodiment of the present invention.
  • the coding region is the net of the display region
  • the promoter region is the region from the upstream 220 to the downstream 500bjp of the transcription start site of the gene according to the present invention.
  • the specific probe is an embodiment according to the present invention designed by using an eArray system.
  • the length of the probe is i2me. The probe has been described in detail above, and is not described herein again.
  • test kit may further include any other components required for constructing a high-throughput sequencing library of a specific region of the genome of the present invention, and no further description is made herein.
  • Genomic DNA fragmentation 2 pg of human peripheral blood mononuclear cell genome DN A was used as a copy, and the following steps were performed: 1. Genomic DNA fragmentation:
  • the obtained DNA fragment is subjected to electrophoresis detection, and the DNA fragment main band is required to be concentrated between 50-300, and no protein RNA contamination is purified by QIAquick; PCR purification kit (Qiagra) or magnetic beads ft, and the qualified DNA fragments are purified. Re-dissolve into 32 ⁇ 1 of elution buffer, spare,
  • methylation linker (sometimes referred to as a "methylation tag linker”) as follows:
  • T4 DNA ligase (Rapid, L603-HC-L) 3 ⁇ total volume 50 ⁇
  • the methylation linker sequence is:
  • Connector 2 5' 'aCACTCTTTCCClACACGACGC X'TTCCG T'CT
  • Set the specific surname probe Through the SSAHA algorithm, design a set of specific probes consisting only of unique sequences. Specifically, the human genome hg, 19 is used as the reference sequence, and the whole base 3 ⁇ 4 group is selected. A region of the basement site of approximately to, ooo promoters, 28,000 CpO islands, 28,000 CpG islands, and 61 imprinted genes was designed as a target sequence design probe, wherein the region of less than 200b was modified by filling the foot to a length of 200 bp, and the overlap 2 domain is removed, and the sequence of the probe is required to have no overlapping sequences, and all probes satisfy the condition that a unique sequence is allowed under up to 3 insertions, acetylene losses or mismatches, and synthesized Each DMA probe sequence is coupled to biotin as a subsequent capture marker, respectively, and then a designed specific probe is obtained by Rocte Nirab:te (en production, spare
  • Table i shows the evaluation of the coverage of the target region by a specific probe based on the present invention.
  • the probe covers almost all of the promoter regions of the gene, and most of the imprinting groups.
  • the inventors of the CpG island and the CGI shore area found that the uncovered areas are mostly short sequence areas with some repetitiveness. If they are added to the range of the probe, not only will the data information of many non-fc areas be increased.
  • the presence of repeating sequences may also affect the capture of other regions, which have less methylation information and do not significantly affect the overall level of methylation, so these regions are not used as probes. the sequence of
  • Block2 5' A ⁇ 1AFCGGAA ⁇ 1AGCGTCGTGT3 ⁇ 4GGGAAAGAGTGX
  • Block; 'and 8iock2' in the base NNNNN 'N and ⁇ 1 and the 2' sequence in the linker 2' sequence was mixed, placed in a centrifuge and centrifuged at full speed for 0 seconds, and then transferred to teai bfock and incubated for 10 minutes at 95 to change D A to a surname.
  • the hybridization mixture was aspirated (recording the remaining volume after hybridization) and added to the prepared magnetic beads. After blowing and mixing for 0 times, the small tube was placed on the PCR machine, and the incubation was carried out for 47 min at 47 € (the heat cover temperature of the PGR instrument should be set. For 57 ⁇ , every 5 ⁇ , take the shock for 3s to prevent the magnetic beads from precipitating)
  • CT Conversion Reagent a CT conversion reagent (CT Conversion Reagent) solution: take a CT conversion test j (face mixture) from the kit, and add 900 ⁇ M water, 50 L M-dissoivittg Buffer, and 30 iL of M-Dilution Buffer (M-D3 ⁇ 4ilion Biifi3 ⁇ 4), dissolve at room temperature and shake for 10 minutes or shake on a shaker for 10 minutes
  • the sequence of the label N is: T, where the base is any combination of T, C, and bases, as a distinguishing identifier
  • the size and yield of the library insert were detected using the Bioanaty ⁇ er analysis system (Agi M, Santa Clara, USA); and the concentration of the library was accurately quantified using Q-K:Rêt
  • the raw data is directly obtained, and the above-mentioned residual sequence result can be obtained by performing basic analysis on the original data, wherein the basic branching process includes the following main steps: First, the sequence is determined by the linker on the linker or the PCR primer. Library data of different samples; then, decontaminating, de-joining, and de-filtering the raw data obtained by sequencing; finally, performing base conversion on the previously processed data, specifically, all of the positive-chain C Converted to T, the G of the interchain is all converted to A, and thus, the sequencing result of the high-throughput repetitive library of the genome-specific region of the constructed example i.
  • the basic branching process includes the following main steps: First, the sequence is determined by the linker on the linker or the PCR primer. Library data of different samples; then, decontaminating, de-joining, and de-filtering the raw data obtained by sequencing; finally, performing base conversion on the previously processed data, specifically, all of the positive-chain C Converted to T, the G of the interchain is all converted
  • the obtained sequencing results were subjected to data analysis to determine methylation information of a specific region of the genome of the template, and the data analysis included: comparing the reads in the sequencing crust with the reference base® group using SOAP2.01 software wherein the mismatch-tolerant rate is set to 2, in order to uniquely determine the ratio s for REA, based on analysis of these reads, access to sequence information and information s read methyl group domain of this particular set of regions Xiang.
  • ) ⁇ 3 is used as a standard to calculate the actual example!
  • the conversion efficiency of the heavy tellurite treatment and, based on the sequencing results, the depth of the sequencing and the coverage, in which, in this example, all the promoter regions, CpG islands of the hgl all-based S3 group, The coverage of CG! shore and imprinted gene regions, and the depth of coverage of different regions, to determine the methylation level of different coverage domains
  • this example is determined by sequencing scoring; capture efficiency of specific probes in Example I.
  • Figure 2 shows the different coverage in determining the methylation information of a specific region of a genome according to an embodiment of the present invention. Depth (cover depth) and cover depth > 5), the percentage of capture area on each chromatin as a percentage of the probe target g domain.
  • Figure 2 is based on the sequencing data: Sequencing raw data sequence is the alignment rate The ratio is 75,27%, the unique alignment sequence is about 14.9M, and the unique alignment rate is 57,78%. According to 3 ⁇ 2 2, more than 5 99% of the probes can detect the coverage depth >1.
  • Capture methylation information of the region and when the depth of coverage is > 5, about 90% of the probes can detect the methylation information of the capture region.
  • the experimental data of the non-components detecting the methylation information is also divided into the percentage of the region of the genome, and the analysis results are shown in FIG. 3 and Table 2.
  • the genomes are subjected to hybridization capture and heavy tellurite treatment according to the following. 3 ⁇ 4 sequence data was analyzed and mapped to obtain Figure 3 and Table 2: The sequence of the original data sequence was 25.5M, and the alignment rate was 75.2.7%. The unique alignment sequence was about 14, M > unique ratio. 57,78%.
  • FIG. 3 shows a promoter for detecting methylation information in each chromatin at different depths of coverage when determining methylation information of a specific region of a genome according to an embodiment of the present invention.
  • the percentage of total promoters that account for the stained shield is 3 ⁇ 4.
  • the promoters that can detect methylation information in each chromatin account for more than 70% of the total promoter on the chromatin.
  • ⁇ 3 ⁇ 4 and the coverage depth is greater than 10 (all of the greater than 5, including more than 10, please invent the front, what is the specific range greater than 5, because greater than 5 is also greater than i0 )
  • the promoter that can detect methylation information in each chromatin can still account for more than 60% of the total promoter on the chromatin.
  • Table 2 shows the methylation information of the specific region of the base group according to the method of the present invention.
  • this example also analyzed the methylation level distribution of the promoter region, Cp J island, CGi shore and imprinted gene region on the genome, and the analysis results are shown in Fig. 4.
  • FIG. 4(a) shows a methylation level distribution map of the genomic CpG island and CCS1 shore region of the determined sample according to an embodiment of the present invention. It can be seen from Fig. 4(a) that the CpG island with high CG content is in a low methylation level; the methylation level in the CGI shore region is significantly higher than that in the Q>G island.
  • 3b4 (b) shows the level of methylation level of the genomic promoter region of a well-defined genus according to a real example of the present invention.
  • Figure 4(b) shows the transcription initiation site in the promoter region. The methylated water at half is at a low level; all scars are consistent with the theory.
  • Figure 4(c) shows the original distribution of the genomic specific region of the sample and the Qualcomm specific region of the sample according to a practical example of the present invention.
  • the reacfe distribution of the sequencing library and the determined methylation level distribution IK of the mover and CpG island regions are as shown in Fig. 4(c), and the methylation information of the specific region of the genome of the abundance is determined according to the present invention.
  • the method can effectively capture each specific area and can accurately detect the methylation information to the area.
  • Example 3 Example 3:
  • the read length is 49 bp
  • the label length is 6 bp
  • the number of down-sequence fragments is 2.67 Mb
  • the test data volume is about 240 M.
  • Table 3 shows the total amount of data for the specific lower machine, the number of t obtained by filtration and selection, and the total number of sequences that can be matched to the human genome after comparison, and Comparison rate, chip capture efficiency, etc.
  • Table 4 shows the depth and coverage of the standard regions of each chromosome and gene component statistics. Overall, the captured data does not differ in the coverage of each chromosome.
  • Figure 6 shows the distribution of the data insert length of the sequencing machine, as can be seen from the figure, although not After fragment selection, but the size of the insert is around 0bp
  • 7 shows the distribution of the sequencing depth of each base in the target region. It can be seen from S that the coverage depth of most bases (about 753 ⁇ 4) is above 20X. If you continue to increase the amount of sequencing, you can guarantee a requirement for the depth of the residual sequence.
  • Figure 8 shows the frequency of mismatching of each base position of Reads on all alignments during data comparison.
  • Figure 9 shows the sequencing coverage of the target base, from Figure 9. It can be seen that the depth of i0:X or more, the coverage of the gene above 60% reached 80%, indicating that the capture of the base a is correct, continue to increase the amount of sequencing, not to meet a certain depth, 100 ( 1 ⁇ 2) Covering all the genes involved, 3 ⁇ 4 10 shows the capture of the histone gene MST2H3A gene and its promoter by the probe chip
  • this embodiment illustrates the feasibility of the remaining chip to extract the episomal region and the exon region of the episomal group S group.
  • the method for constructing a flux sequencing library of the present invention and the application thereof can conveniently and effectively apply the construction and sequencing of a high-throughput sequencing library of a specific region of the genome of Zixiang, and can be effectively used for subsequent mutation detection analysis and Analysis of cytosine methylation detection, and the quality of the obtained literature is good, sequencing and analysis results are accurate.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明提供了一种构建高通量测序文库的方法,包括:将基因组DNA片段化,将DNA片段进行末端修复,在3'末端添加碱基A,将具有粘性末端A的DNA片段与甲基化接头相连,利用特异性探针对连接产物进行杂交捕获,以获得目的片段,将目的片段进行重亚硫酸盐处理,以将非甲基化的胞嘧啶转换为尿嘧啶,将经过转换的目的片段进行PCR扩增,分离纯化的扩增产物构成高通量测序文库。本发明还提供了应用所述高通量测序文库确定样本的基因组特定区域的甲基化信息的方法和装置。

Description

高通量測序文库的构建方法及其应用
优先权信息
本申请请求 20Π 年 U 月 .5 曰向中国国家知识产权局提交的, 专利申请号为
2011 10362032.2的专利申请的优先权和权益, 并且通过参照将其全文并入此处 技术领域
本发明涉及生物技术领域 具体地, 涉及 甲基化检测技术 特^是涉及基因 组特定区域的甲基化检测 更具体地, 本发明提供了一种构建高通量测序文库的方法, 一种确定样本的基因组特定区域的甲基化信息的方法、一种用于确定祥本的基因组特定 区域的甲基化信息的装置以及一种用于构建祥本的基因组特定区域高通量测序文庠的 试剩盒。 背景技术
D A甲基化是研究最为深入的表观遗传学机 , DNA甲基化在维持正常细胞功能 抑制寄生 DNA成分对基因组完整性的损害、 染色质结构修饰、 X染色体失活、 基因组 印记, 胚胎发育以及人类胂瘤发生中起着重要伟用, 是 前新的研究热点之一
然而, 目前对基因组待定区域如启动子区域、 CpG 岛区域、 CpG 岛外区域以及印 记基因区域的甲基化检测的研究, 仍有待改进。 发明内容
本发明旨在解决现有技术问题的至少之一 由此, 为了代表 检测基因组上特定区 域的甲基化信息, 本发明提供了高通量测序文库的构建方法及其应用
根据本发明的一个方面, 本发明提供了一种构建高通量测序文庠的方法.. 根据本发 明的实施例, 该方法包括以下步疆: 将基因组 0 A片段化, 以便获得 DMA片段; 将 该 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段; 在该经过末端修复 的 DNA片段的 3'末端添加碱基 A, 以便荻得具有粘姓末端 A的 DNA片段; 将该具有 粘性東端 A的 DNA 段与甲基化接头相连, 以便获得连接产物; 利用特异性探针对 * 连接产物进行杂交捕获、 以便荻得目的片段; 将该目的片段进行 PCR扩增, 以使荻得 增产物; 以及分离純化所迷扩增产物, 该扩增产物构成该高通量测序文库 根据本发 明的实施例, 可以采用的特异性探针是对已知甲基化位点特异性的, 例如, 该特异¾探 针是基于采用人类基因组伟为参考序列,并且采用已知具有甲基化位点的基因区域作为 靶序列而设计的, 其中, 该已知具有甲基化位点的基因区域可以为下表 1:中所列出基因 的至少一科的编码区和启动子区 表! 4分子功能相关基因
子功 & GO iD ( GO编号) Gene names i基. 名 ,1
CDYm AR B i ELP3 KA'OA SAP130 CREBBP ΚΛΤ8 |
MKTTLS TADA3 TADA2A TAF6L GEA5 «ΑΤ6Β 'TAF .ΛΤ5 |
Hisione (纽蛋 NC:OAi SUPT3H TAF5 SRC:AP HAT! CDY2B EPCi AT6A |
C(O:00ti4402
CLOCK EP300 TAF12 TAF i GTF3C4 MED24 KAT2B CDYL | TAF5L SNG3 PET! S? BRCA2 ELM EDFI BAZ1ASUPT7L TAFIO |
NCOA2 ΚΛΤ7 TAF SI.. TADAi |
SALLf HDACI ! HDACi HDAC8 HDAC2 HDACH) HDAC5
C;O:0<) 44{t7
HDAC'3 H1>AC4 HDAC6 iVSTA2 SiRTl Hi:)AC9
GO :0008469 PR T5 PSMTS PR T7
(¾>細 (i4M ΚΑ'ΠΑ B CA2
00:0010485 BRCA2 USP22
O:0(tI7!36 SiRT6Si T2 SI T!
EZH2 EH T2 SETD7 SUV3 H2 ASHiL MEN! SETDfi ΕΗΜ'Π
C3 ;0iH 024 OOTIL SU 3¾>Hi P DM2 SMYE SETDBi SETMA Sl;V42 H!
PRD 9 PRDM6 MLL2 WHSC1 WHSCIL! SETD A SETD2
GO:(M 1 ¾ L3MBi:iJ
HDAC? HDAC! 1 HSJAC! fflMCS. HDAC2 HDAC") HDAC5
00:0032041
HD— ΛΟ i-JDAC4 HDAC6 i«)AC9
00:0032452 KD 2 PHFS DMIA PHP2 JARED2
<K):(K>3245 'I4ori ->9KD iA
GO :0032454 M4C PHFS JHDMiD KDM 1 A ΡΉΡ2
GO:0«33?46 JMJD6
(¾)細 3749 JMJD6
<K):(K>34647 KDMSB
GO :()«34648
G :0«34 49 KDMiB
HDAC'? HDACi i HDAtM HDAC8 HDAO HDACH) HDAC 5
HDAC3 HDAC4 HDAC6 i-JDAC
GO :00 5033 Nt:OR! MAP 'i'P53
GO :(«35034 EIDI
G i SPi SH«2 Z8T87A EiDS TP53 MEF2A FOXP3
GO :0035035
EPASI GLG H!FiATAF?
CHDS CHDi CBX21NG4 KAT8 SUZI2 i Gl L3MBTL2 L.RWD ! |
PHF13 CBX5 NCAPD3 PHF8 ΪΝ02 CCDCtOS CSX4 WDR92 j
GO舰; 5064
GLY ί JHDMID CBX8 SP5NS. B8P51NG3 i C.5 TR.iM2 |
TDRD3 MSL3 UHRPI CBX7 PH 2 L3MBi:LS 腿 I
Figure imgf000004_0001
Figure imgf000005_0001
表 1-2生物学加工相关基因 生 ¾5学加工 GO ΪΙ) (GO编号 ) 基 ¾名
Figure imgf000006_0001
ϋυ/:i O Ϊ61£/-8oiAV
Figure imgf000007_0002
Figure imgf000007_0001
KUVBU I„BFI ARRBI MOR 4:SJ RUVBU D AP3: EPCi EPSOO TR5MI6 ΛΡΒΒί iRR TRRAP TCF3
(30:0043967
1NG3 BRCA2 USP22 YEATS-4 EP400 BROS UVBL2 EAF ORF4L1 DMAPi
00:0043968
EPCS TRRAP iKC;3 YEATS4 i3O:0»43973 L!>BS
GO:麵 3 ! INCM PHF16 PHFi 7 PHFi 5 KAT7
GO:0043¾82 ING4 MEAP6 PHFI6 ΡΗΠ7 PHFI 5 KAT?
!NC ΜΕΛΡ6 PHF16 PHFi? BRIM PHF15 AT7
SNG4 M AP6 PHF!ft KATS MSU MIX PHFi 7
<K>:0043984
S 1.3 PHF!.5 KAT? MSL2
GO:0(}43«)S5 PBMT5 PRMTl P KC!¾ COPRS
GO:0043¾8? PS6KA4 PS6 A5
RPS6KA4 PS6KA5
GO:0fM3 t>0 PS6i<Ai
€.0:004 154 KA1'2A Mi-AF BRD4
EHMT2 SUV39H2. S5-TDB2 ΚΗ Ί'ί SUV39HI
P DM5
CX MUA DPY30 Mil. KDM6A ASH2L WDR5 SETD!B LI..2 PAXfP! RHBP5 WD 82 MU..5
SETDiA
GO:翻 569 MLLCTCFL
<jO:005i57i D M'BB BRCAi MYB PAXIPf OCT DNMTi
CiO: t«!572 tlPISB CO
<;O: 05i5?3 D M'llB BRCAi PAX5 DN 'i'l
GO:«05J57 GFJiB MYB JA I 2
UB 2B
.fD
C.O:00 0i*?9 層: D6
GO:00705I2 BRCAI
GO.0070535 K S.6S
GO: 07(!537 U!MC! BRCC3
<;O: 0?0544 P11F8 JMJD5.ffli 'm) N()66 DM4A
G():<}070734 ΕΗ 'Π
GO:0i)?0932 HDA l HDAC4 Si T! HDAC:9 ί30:ί>ί}7(!933 HDAC'i RCO i I-ST eDAC4 Hf>AC<i
(30:007 i! 10 HLi::S
<K>: 0?i557
GO:«07JS9 WAC GO:!5072355 C.SG2
ίΚ»:0額) ·24). SPSi CT'BPl
ί3Ο:2('Κ)0 1 α.ΐΕί"
GO:20006I7 B CAl
<Χ):20{!()620 BRC.'Ai
GO:0( 0029 GPXI BAZJB GLMN CTCF
(:'Ο:(>ίΜ(ΚΒ0 ASIP CTCF
炎.現 j 传学
GO舰 34 SPIi T I.M27 DNK-1T3 (IPC! C&EBZf
00:0045815 SLC50AS
D!RAS3 DNMT3L D MT3A GS 3A EKD KDM 1 B
GO:«0034 DHMT3B KDM! R ZFP57 CTCFL KC QS CTCF 基因印记
10F2 P MT7
GO:007!514 ASiP ND C5NAS AX!KI
€ιΟ;0Ι )(!ΐ:ΐ3 LW-2A SiRT2 SMARCAS SUV39HI S!RTl PB
<;O: (KH672 TLK 1 ΎΙΚ2
SA B EZH2 VCX S0X2 L WOi HN-iG2()B H.DAC5 HMGA2 SUV39H1 SOX 1 SO i
<;O: (Ki6325 :
MiSr!H4K TC iF?LI HMQ HMG2 A MiMi
ACTL6B SATBl
ASFIA SUV39H.2 HD S iAC8 SMARCAS ASF IB
Figure imgf000009_0001
BAZIB 'HD3服 ΪΡ3. TA2
CHDl HDACi RSFi ARID t i ¾ A SM A CC2 S A CDi SATB2 AT2A BAZ2A SUV39H2 t'HDiL RESE I OSO SMARCBI TAF6L MEN i BPTF .KJXAl HDAC'2 S¾JIT6B: SMYD SMARCCi SUP'f'SH SMA ;: A5 P. 2 SMA C!-I HDACS
GO:6006338
HiLSl RBBP4 S ARCA2 S ARCD3 FOXP3 S ARCA4 ΜΎΒ <Ήί>6 H FiA SMARCAI H AC4 A'"i'2BPBR.Mi Β ΪΡ3 A(:'Ti..6B SL;PT }li C!I!RA '! SOX CBX3 TTFl BAZlARBi T F! ACTLAA
KLFi SMA C:D2
SI 7'2 HDACS HiLSi SI T4 MU.2 SI T5 UBS2
G():<K !6342
SiRTi TNP1
i3O:0t;06.¼S SIET!
(;0:《 W)344
! MT5A HELLS MBD3 DNMT3B SMARCM
GO:糊 6346
SiSTS
00:0006348 SS T2 HATS 丽D/:/ O Ϊ69 ZIOZX工) d 9.8ΐ.0εΐΟΖAV
Figure imgf000010_0001
Figure imgf000011_0001
Figure imgf000011_0002
Ϊ69丽 ZIOZXD/工:) d 9.8ΐ.0/εΐΟΖ OAV
Figure imgf000012_0001
LHDC3 CREB! TALS NCO i ARlDi A PPPI RIO RCCi LDB i M EI ί A SARA HDAC2 CEBP8 X3 CT!'£D2 UBE2A議 Λ PM2 JU AR POCiZ BA D2 GATA3 TiMELESS ΊΤ53 RJH3 I3F2A PHOX2H S ARCA2 S A12 SR VSP3 S ARCD3 SCRT2 HNRNFK PAX6 SMARCA4 RIDI MEF2C UBE28 H! PNT D Xl \ TiPiN PHOX2A CBXfi PLCBl BKi)4 RXRA KDM! A PCGF2 STAT6 SS..X4 MSH6 MYOD APT.X
CBX7 PER Si Ti
GO ;000079] D MT3A
SALI.J MBD2 C8X2 CBX6 HDAC2 DAXX (H«2 MBD3 SUVi«HI SMAHCA4 MECP2 TOP2B CBX8 SALL4 UHB.Fi CB 7
Ρί . SUZ!2 BED F2 PMC!B K.iNG!
CK):0 O56?7 B,<\Z2A Si T2 BAHDI SMA CA5 SiJV39H! SiRTS
(K):i聽 6?S NAPi L3 CHAFIB ΝΛΡΙ1.2 NAP I Li Ci-iAF!A NAPH.
SMAKCA4 RBMX CEC 2 TRI 28 T 24 CBX3 ALKSHJ
GO細 57 ΐ 9
Sffi.I'l
TCP! D MT3I, SUV39H2 D MT3A VDR C X5 DNMT3B ATRX AiCFCBXS PSSPi TR1M28 CBX3 SI Ti SATSI FOXCI
HELLS LRWDI BAZiH S CEN? 8XI IKZFI D Mll
«): (励 5 ?24 SIRT6 T KSS 8P!
GO :0005 ?26 NUFSF! S.V1ASCA4 TH.i. 24
CK):0OO8623 C'HRAC'i AZ!A
ASFiA. SM— ARC.D'i S MARCH! BMiO lFi ί MAEL ΒΛ.Ζ1 B CHD3 ESR1 SMARCA1 AT2B SOX9 MVS Ml ACT A
(;;0:(»3!6!8 C.BX5 NCAPD3 CBX3
<):CK 1933 S..RWD1
GO:0i)33553 BAZ2ASUV3 Hi Si TS RRP8 eXOSO EXi)SC:i0 PSIPi EXOSCS EXOSC:4
GO: 0SS9B5 HKIGAI CDK 2A HMGA2
Figure imgf000014_0001
Figure imgf000015_0001
Figure imgf000016_0001
Ϊ69丽 ZIOZXD/工:) d 9.8l.0/ClOZ OAV
Figure imgf000017_0001
利用根据.本发明实施倒的构建高通量测序文库的方法, 能够有效地构建基因组
DNA样品的高通量測序文库 ,特^是能够有效地构建基因组 DNA祥品的已知甲基化位 点的特定区域的高通量测序文戽, 从 能够有效, 充分地应用于高¾量测序技术 通过 对文库的测序, 然后基于对测序结果的数据分析, 能够有效地获得基因组特定区域的甲 基化位点信息, 实现对基因纽 DNA样品的基因组特定区域的甲基化检測
根椐本发明的另一方面,本发明提供了一种确定样本的基因组特定区域的甲基化信 息的方法„ 根据本发明的实旄例, 该方法包括下列步骤: 根据前面所述.构建高通量测序 文库的方法, 构建该祥本的基因组特定区域的髙通量测序文库; 对该样本的基 组特定 E域的高通量測序文库进行测序,以便得到测序结杲;以及对该测序结栗进行数据分析 以便确定该样本的基因组特定区域的甲基化信息
利用根据本发明实施例的确定祥本的基因组特定区域的甲基化信息的方法 5能够准 确地确定祥本的基 组特定区域的甲基化信息,从而实现对样本的基因組特定区域的甲 基化检测,:
裉据本发明的再一方面,本发明提供了一种用于确定祥本的基因组特定区域的甲基 化信息的装置 根椐本发明的实滬例, 该装置包括: 文戽制备单 该文库制备单元用 于制备样本的基因,组特定区域的高通量测序文库, 该文库制备单元内设置有特异姓探 针; 测序单元, 该測序单元与读文库制备单元相连, 并且从该文库制备单元接收该祥本 的基因组特定区域的高通量測序文库,以便用亍对 ¾样本的基因组特定区域的 ¾通量测 序文库进行測序, 获得测序结果; 以及数据分析单元 该数据分析单无与该测序单元相 连、 并且从该測序单元接收该测序结果、 以便对该測序结果进行数据分析, 确定该祥本 的基因纽特定区域的甲基化信息
利用根据本发明实施例的用于确定样本的基因组特定区域的甲基化信息的装置,能 够方便准确地确定祥本的基因组特定区域的甲基化信息,可以应用亍多种针对基因组特 定区域的甲基化的研究
裉椐本发明的又一方面,本发明提供了一种用于构建祥本的基因组特定 S域的高通 量测序文库的试剂盒 根据本发明的实旄例, 该试剂盒包括: 特异性探针, 该特异姓探 针是对已知甲基化位点特异性的„利用根据本发明实拖倒的用于构建样本的基因组特定 区域的高通量测序文库的试剖盒,能够方便有效地构建祥本的基. 组特定区域的高通量 測序文库
本发明的附加方面和优点将在下面的描述中部分给出 部分将从下面的描述中变得 明显, 或通过本发明的实践了解到 附图说明
本发明的上迷和 /或附加的方面和优点从结合下面附 S对实施例的描迷中将变得明 显和容易理解, 其中:
图: 1: 显示了 ^据本发明一个实施例的构建高通量测序文库的方法的流程示意 S; 图 2: 嚴示了根据本发明一个实旄例的方法确定基 S组特定区域甲基化信息时、 在不同 覆盖深度下(覆盖深度》 1及覆盖深度 > 5 ), 每条染色质上的捕获区域占探针靶区域的百分 比 S ;
3: 显示了根据本发明一个实施例的方法确定基因组特定区域甲基化信息时, 在不同 覆盖深度下, 各条染色质中检測到甲基化信息的启动子占该染色盾的总启动子的百分比图; S3: 4: 显示了根据本发明一个实施例的方法确定基因组特定 域甲基化信息时, 基因组 上启动子区域、 CpG岛、 CpG岛外(在本文中指为 CGI shore )及印记基因区域的甲基化水 平分布结 5 其中, (a) 显示了银据本发明一个实旄例的确定的样本的基因组 QpG岛、 CGI shore 域 的甲基化水早分布图,
(b) 显示了根据本发明一个实施例的确定的祥本的基 组启动子区域的甲基化水 平分布图,
(c) 显示了祥本的基 S组特定区域的原始分布和根据本发明一个实旄例的确定的 祥本的基因,组特定区域的高通量测序文库的 reads分布及启动子、 CpG岛区城的甲基化水平 分布 S;
5:显示了根据本发明一个实施例的用于确定祥本的基因组特定区域的甲基化信息的 装置的示意 S ;
¾ 6: 显示了根据本发明一个实施例的测序读段(在本文中也成为 "reads" ) 的插入片 段长度分布;
图 7: .¾示了根据本发明一个实施例的捕获区域的各碱基的测序深度和累积测序深度统 计图;
S 8:显示了根据本发明一个实施例的在读段比对过程中各位置減基错配发生頻数的示 意 S;
9:显示了根据本发明一个实施例的炎黄细皰系捕获的各表 51相关基因在不 1 深度下 的覆盖 t和累计覆盖度的示意图; 以及
围 10; 显示了根据本发明一个实施例的个别捕获基 ®的测序覆盖的直观, ¾示结果 发明详细描述
下面详细描述.本发明的实施例, 所迷实旄例的示例在附图中示出, 其中自始至终相 同或类拟的标号表示相 或类似的 件或具有相同,或类 功能的元件 下面通过参考 图描述的实旄例是示例性的, 仅用于解释本发明, 而不能理解为对本发明的限制 构建高通量測序文库的方法
根据本发明的一个方面, 本发明提供了一种构建¾遢量測序文库的方法 参考图 1 , 银 据本发明的实施例, 该方法包括以下步疆:
首先, 将基因组 DNA片段化, 以便获得 DNA片段。 在本发明中所使用的术.语 "DNA" 可以是任何包含脱氣核糖核苷酸的聚合物 ,包括但不限于经过修傳的或者未经修饰的 DN A。 本领域的技术人员可以理解,, 基因组 DNA的来源不受特别限制, 可以从任何可能的途径获 得, 可以是通过市售直接获得, 也可以是从其他实验室直接获取, 还可以是直接从祥本中 提取 Λ 根据本发明的实旄例 , 可以从样本中提取获得基 !¾组0!^^ 根据本发明的一个实施 例, 抅建高通量剩序文库的方法可以进一步包括从祥本中提取基因组 DNA的步 t 根据本 发明的一些具体示例, 祥本可以来源于哺乳动物、 植物、 和凝生物的至少一种。 木发 明的一些实旄例, 哺乳动物可以为人和小鼠的至少一种 根据本发明的一个实施例, 基因 组: DNA可以为人类全血基因组: DNA, 优逸为外周血单核细胞基.因組 DNA 发明人发现, 当采用 YH ceil基 組 DNA构建高通量 ¾序文库时, 从样本中提取基因组 的操作方
is 便易行, 且获得的 DMA质量好、 甲基化信息完整, 由其构建的样本的基因组特定区域的髙 通量测序文库能够方便地应用于髙通量测序技术, 从而基于对测序结杲的数据分斩就能方 便有效地荻.得祥本的基因组特定区域的甲基化信息, 根据本发明的实旄例,, 基因组 DNA的 量不受特別限制, 根据本发明的具体示倒, 优逸基因組: D A的量为 2 g 发明人惊奇地发 现, 当基因組 DNA的量为 2 g时, 根椐本发明实施例的构建高通量测序文库的方法构建的 祥本的基因,组特定区城的高通量测序文库, 能够非常方便地应用于高通量測序技术, 如 Sotexa測序技术, 且文戽测序结果准确,, 可重复性好, 包含的特定区域的甲基化信息准扇、 甲基化位点覆盖率高
其次, 将 DNA片段进行末端修复, 以便获得经过末端修复的 DMA片段 Λ 根据本发明 的一个实施例, 在将 DNA片段进行束端修复前, 可以进一歩包括純化 DMA片段的步驟, 由此, 使得后续的末端修复易于进行 裉据本发明的实旄例, 将: DN A片段进行末端修复可 以利用 Kkmow片段、 T4: DNA聚合酶和 T4多核苷酸激酶进行, 其中, 所述 Kienow片段具 有 5,- 3,聚合酶活姓和 3'→5,聚合酶活性, 但缺少 5'→3'外切酶活性 由此, 能够方便准确 地对 ί)ΝΑ片段进行末端修复 根据本发明的实施例, 还可以进一步包括对经过末端修复的 DMA片段进行纯化的步黎, 由此能够方便地进行:后续处理
接下来, 在经过末端修复的 DMA片段的 3'末端添加碱基 A, 以便获.得具有粘性末端 A 的 DNA片段 根据本发明的一个实施例, 可以利用 Klerrow (3'- 5,exo- ), 即具有 3,→5'外切 鵄活性的 Kkmow, 在经过末端修复的 DNA片段的 3'束端添加碱基 由此, 能够方便准 确地将.减基 A添加到经过末镇修复的 DNA片段的 Γ末端 根据本发明的实施例, 还可以 进一步包括对具有粘性末端 A的 DNA片段进行純化的步 f 由此能够方便地进行后续赴理„ 接着, 将具有粘姓末端 A的 DNA片段与甲基化接头相连, 以便获得连接产物 本发明 中所使用的术语 "甲基化接头 5' 是指这祥的一种接头, 在其核苷酸序列中, 所有 C位点均 被甲基化修饰 根据本发明的一个实滬^ ,在将具有粘' ϋ末端 A的 DNA片段与甲基化接头 相连前, 可以进一步包括对常规测序所使用的接头进行甲基化的步骤 由此, 能够有效避 免测序接头对后续重亚硫酸盐处理等操作带来的千扰;, 例如重亚疏酸盐处理处理过程中接 头序列可能会被改变 本领域的技术人员可以理解, 对接头进行甲基化的方法不受特別限 制, 可以利用本领域已知的任何方法对测序接头进行甲基化
根据本发明的一些实施例, 甲基化接头中还可以进一步包舍标签, 由此可以方便地同 时构建多种样本的基因组待定区域的高通量测序文库, 并能够有效地应用于高通量测序平 台, 从 在对测序结果进行数据分析后, 基于标签的序列信息, 就能够准确地区分多种样 本的基因,组特定区域的高通量测序文戽的序列信息以及样本的基因组特定区域的甲基化信 息, 由此 能够充分地利用高通量测序平台, 且能够节省时间、 降低成本
根 本发明的实施倒, 标签的长度为 6bj>, 在文库制备时不同祥品分别连接不同的标签 接头, 捕获前将多个不同文库混合到一起形成一个新文库, 使用该新文库进行探针捕获并 测序 測序所得数据是多个样品混合在一起的, 此时根据每段 reads (测序仪所读出的序列 结杲) 中不同的标签序列将不同祥品 分开 此方法可以大大降低成本、 时间、 人力 根 据本发明的实施例
根据本发明的一个实旎倒, 将具有粘性末端 A的 UNA片段与甲基化接头相连是利用 T4 DNA连接 ¾进行的, 由此可以方便地获得连接产物 根据本发明的实旄例, 还可以进一 步包括对连接产物进行純化的步骤, 甴此能够方便地进行后续处理,:
然后, 利用特异性探针对所述连接产物进行杂交捕获 > 以便获得目的片段 根据本发 明的实滬例; 这里的术语 "特异性探针" 是指探针是对已知甲基化位点特异性的 裉据本 发明的具体示倒, 特异性探针是基于采用人类基 组作为参考序列, 并且采用基 组上已 知具有甲基化位点的特定基 S区域作为耙序列而设计的, 具体地, 已知具有甲基化位点的 基因区域包括逸自启动子区域, CpG岛区域、 CpG岛外区域以及印记基因区域的至少一种, 由此, 剩用根据本发明实旄倒的特异牲探针迸行杂交捕获, 能够有效地捕获样本中与耙序 列互补的序列、 即祥本中已知具有甲基化位点的基因区域(在本说明书中, 有时也称为 "基 因组特定区域" λ
根据本发明的实施例, 可以用于设计特异性探针的已知具有甲基化位点的基 ®区域为 表 .!中所列出基因的至少一种的编码区和启动子区 ,根据本发明的实施例、 所述编码区为外 显子区域序列 , 所迷启动子区为基 S3转录起始位点的上游 2200bp到下游 50«bp的区域 根 据本发明的实施例,, 所迷特异性探针是采用 eAxray系统设计的、' 银据本发明的实旄倒 任 逸地, 所述.探针的长度为 i 2mer
如前面表!所示的基因是发明人利用 gem? tmto!ogy等数据库资源, 经过大量试验鋒选 荻得的。 发明人意^发现采用这些基因制备的探针能够最有效地捕获所期望的 的片段, 并且有助于后续的研究 根据本发明的实施例, 可以采用基 S TSS (转录起始位点)上游 2200bp到下游 500bp作为 promoter (启动子) 区域, 以及 excm (外显子) 域序列传为基 因的编码区, 并针对这些区域的序列信息进行捕获探针的设计 另外, 令人吃惊地, 利用 这些基 座位靶篩选得到的测序结杲 ,, 对于各染色体的覆盖不存在偏差
根据插黢的互补配对原则, 单链状态的捕荻探针可以与单链状态的目的序列互补结合, 从而成功地将 11标区域捕获 根据本发明的实施例, 探针设计可逸择固相捕荻芯片 (探针 固定在闺体载体上)或液相捕获探针(探针游离在液体中), 然而 相捕获芯片因探针长度、 探针密度、 价格偏高等诸多因素 制, 液相捕获即作为首选 ¾
根据本发明的实施例, 采用安捷伦公司 ( AgiieM ) 的探针设计系统 eA y设计探针, 探针长度 120i r, 探针可覆盖长度范围大, 从小于 200kb到 24Mb甚至更长,, eAn'ay探针 设奸系统可以方便地使用生物信息学工具 window masker (窗《序列屏蔽, )和 repeat masker (重复序列屏蔽)对 标区域分析并进行屏蔽, 由此 , 可以避免对这些区域进行 #针设计, 非常有效地减少实验中的捕获千扰以及后续序列分柝时发生的比对千扰; 并且縮短覆盖长 度可以在一定程度上减少成本,:
根据本发明的实施例, 核酸中高 CG ( CG碱基含量高于 60% )序 由于碱基 C、 G的 分子結构不同会导致捕获效率低于常规( ATCG碱基平均含量各 25% )序列 , 对于重点研究 对象 CGI ( CpG island, CpG岛) 区域, 可以通过提高探针设计量 更.好更多地得到( 数据
此外 根据本发明的一个实施例, 在杂交捕获前, 可以进一步包括利用诸如 c / DNA 和接头封闭序列的单链寡核苷酸对连接产物(尤其是连接产物的基因组序列中的重复区域) 和连接产物上的甲基化接头进行杂交封闭的步骤,:发明人俅奇地发现, 当使用 d/ DNA和 接头封闭序列分別对连接产物 (尤其是.连接产物的基因组序列中的重复区域)和连接产物 上的甲基化接头进行杂交封闭后, 能够显著地增强对连接产物的杂交捕获 根.据本发明的 实施例, cj-ί ί》ΝΑ的使用量不受特別限制, 根据具体的示例, 优选采用过量的 DNA 对连接产物的基 组序列中的重复区域进行杂交封闭 其中, 这里所使用的术语 "过量'' 是指 c -! DNA的量远大亍待进行杂交捕获的连接产物的量 即采用 cj-l D A的量可以是 待进行杂交捕获的连.接产物的量的 2倍以上 根据本发明的具体示例 优选,,采用 cj-ί DNA 的量为待进行杂交捕获的连接产物的量的 5倍, 根椐本发明的一些实施例, 采用 cj- / DNA. 的量小于待进行杂交捕获的连接产物的量的 5 倍, 则封闭杂交不彻底, 重复序列的非特异 性强杂交背景信号干扰强烈,严重影响核酸杂交的效率; 采用 C/ DNA的量大予待进行 杂交捕获的连接产物的量的 5倍, 则过多的 c -l DNA会影响探针与连接产物的结合, 同祥 会影 ^核酸杂交的效率 由此,采用待进行杂交捕获的连 4^产物的量的 5倍的 cj DNA对 连接产物的基因组区域重复序列进行杂交封闭, 能够方便、 有效地进行封闭, 以去摔重复 序列 DNA, 从而在后续的核酸杂交过程中, 能够有效避免重复序列产生的非特异 ¾强杂交 背景信号的千扰, . 著提髙核酸杂交的效率, 增强杂交效果 根据本发明的实施例 接头 封闭序列包括逸自 B!ock l和 Btock2的至少一种, 由此, 能够有效地对连接产物上的甲基化 接头进行封 根椐本发明的实旄例, 可以采用 ! ig的连接产物进行所述杂交捕获、 由此 能够提高杂交捕获的效率 根据本发明的具体示例, 利用特异姓探针对所述-连接产物进行 杂交捕获,, 可以进一步包括利用链尊素磁珠捕荻目的片段, 由此, 能够高效地捕获 的片 然后, 将目的片段进行 PCR扩增, 以便获得扩增产物 根据本 明的实施例, 可以使 用热启动 taq DNA聚合酶对经过转换的 ϋ的片段进行 PCR扩增 根据 ·本发明的实施例, 热 启动 taq DNA聚合酶的种类不受特別限制, 根据本发明的具体示例> 热启动 iaq DNA聚合 酶可以为 r-taq聚.合酶, 由此 PCR扩增效率高、 用时少
最后, 分离純化扩增产物, 所得到的扩增产物构成全基 ®组甲基化高通量测序文库 根据本发明的实旄倒, 分离純化扩增产物的方法不受特躬限制, 根据本发明的具体示例, 可以通过逸自磁珠純化、純化柱純化和 2%的琼腊糖凝胶电泳的至少一种进行,忧选通过; 2% 的琼腊糖凝胶电泳进行 根据本发明的一些具体示例 , 高通量測序文库的文库片段长度为 300~450bp, 由此, 高通量测序文库能够方便有效地应用于高通量測序平台如 Soiexa测序年 台, 且可重复性好 測序结果真实可靠 包含特异性探针所针对的基 ®组特定区域的甲基 化信息较完整, 根据本发明的实旄例, 在得到 的片段之后, 可以将目的片段进行重亚硫酸盐处理 , 以便将 §的片段中非甲基化的胞嘧啶转换为尿嘧啶, 获得经过转换的 的片段 根据本发 明的实施例, 在将 的片段进行重 硫酸盐处理之前 , 可以进一步包括将 !3的片段与片段 化的 λ -DNA混合。发明人发现,通过添加外源 DNA ( λ -D A h即将目的片段与外源 DNA 混合, 然后进行重亚 酸盐高效共处理, 对 标 DNA片段能够起到保护诈用, 最大限度地 降低重亚硫酸盐对徵量 DNA的破坏,可以进一步提高检测精度,使得较少量的基因组 DNA, 甚至纳克级, 例如 5(M5(hig基 组的甲基化检測成为现实 根椐本发明的实施例, 片段化 的 λ -D A 的添加量不受特 »\限制 5 根据具体的示例 优选片段化的 λ -D A 的量为 2{K}-40Ctag, 更优选为 20( g 本领域技术人员能够理解 可以通过本领域已知的任意方法 制备这些片段化的 λ -DNA, 倒如可以随同前面的 DNA片段化处理一起进行制备。
重 碗酸盐处理可以通过本领域已知的任何方法进行 根据本发明的具体示例 可以 采用商品化的试剂盒进行, 优选地采用 EZ DNA Methylation-Gold Kit™ ( YMO )进行 发 明人惊奇地发现, 采用 EZ DNA Methylaiion-Goid Kit™ ( ZYMO )对 U的片段进行重亚硫酸 Ik处理时, 方便快捷, 且处理效杲好, 3的片段中非甲基化的胞嘧啶能够高效准确地转换 为尿嘧啶 并且剁于后续处理
由此, 根据本发明的实施例 5 可以在捕获后直接测序> 进而 > 基于测序結果可以进行 基因的单核苷酸多态性( SNPs )、 核苷酸突变、 插入、 缺失 ( iiidel )或拷 . 数变化 ( CNVs ) 分析 也可以在捕获后经重亚碇酸盐处理及测序进行 DNA甲基化状态分柝, 由此, 可以进 行如甲基化密度, 不同元件甲基化氷平、 胞嘧啶甲基化分析和差异性甲基化区域( DMRs ) 分析等
利用才 据本发明实施例的构建高通量测序文库的方法 能够有效地构建样本的基园组 特定区域的高通量测序文库,, 从而能够有效、 充分地应用于高通量测序技术, 通过 f高通 量测序文库的测序, 然后基于对测序结果的数据分折 就能够有效地获得样本的基因组特 定区域的甲基化信息, 实现对祥本的基因組特定区域的甲基化检测 确定样本的基因组特定区域的甲基化信息的方法和装置
根据本发明的另一方面 5 本发明提供了一种确定样本的基因组特定区域的甲基化信息 的方法。 根据本发明的实施倒, 该方法包括下列步槳: 根据本发明实旅例的构建高通量¾ 序文库的方法构建祥本的基因组待定区域的高通量测序文库; 对读样本的基因组特定区域 的高通量测序文库进行测序, 以使得到測序结果; 以及对测序结果进行数据分树 以便确 定祥本的基 组特定区域的甲基化信息
根据本发明的一些实施例 測序是利用高通量^序技术进行的,. 本領域的技术人员可 以理解, 可以通过本領域已知的任何高通量测序技术进衧测序、 裉据本发明的具体示例 , 优选地利用 H o OOO測序仅进行测序 发明人发现, ^用 ffis¾|2000测序仪对样本的基因 组特定区域的高遏量测序文库进行測序, 能够有效地获得测序结果, 且¾序用时少、 效率 高、 测序结果准确, 可重复性好 利用 .根据本发明实施倒的确定祥本的基因组特定区域的甲基化信息的方法, 能够有效 地构建样本的基 组特定区域的高通量测序文库, 并且能够通过高通量測序技求如 Sokxa 剩序技术实现对文库的准确測序, 基于对測序结果的数据分析 就能够准确地确定祥本的 基 ¾组特定区域的甲基化信息, 从而实现对.祥本的基因组特定区域的甲基.化检 , 且特定 区域的甲基化位点覆盖多, 获得甲基化信息完整,:
根据本发明的再一方面, 本发明提供了一种用于确定祥本的基因组特定区域的甲基化 信息的装置 参考图 5, 根据本发明的一个实旄例, 该装置 1000包括: 文库制备单元 100、 剩序单元 200以及数据分析单元 3()0,,
根据本发明的实旎倒, 文戽制备单元 100 用于制备祥本的基因组特定区域的高通量测 序文库, 其中, 文库制备单元 00 内设置有特异性探针 根据本发明的实旄例 特异姓探 针是对已知甲基化位点特异性的 根据本发明的具体示例, 特异性探针是基于采用人类基 因,组作为参考序列、 并且采用基因组上已知具有甲基化位点的特定基因区域作为靶序列而 设计的, 具体地, 已知具有甲基化位点的基因区域包括逸自启动子区域、 CpG岛区域、 Cp<3 岛外区域以及印记基因区域的至少一科 由此, 利用根据本发明实施例的特异性探针进行 杂交捕荻, 能够有效地捕获榉本中与靶序列互补的序列 即样本中已知具有甲基化位点的 基 ®区域。 由此, 文库制备单元 100 可以适于实旄前面所迷的高通量测序文库构建方法 根据本发明的实施例, 可以用于设许特异性探针的已知具有甲基化位点的基因区域为表 !: 中所列出基因的至少一种的編码区和启动子区,, 根椐本发明的实施例, 所述缟码区为外显 子区域序 , 所迷启动子 E为基因转录起始位点的上游 2200bp到下游 500bp的 S域„ 根据 本发明的实施 , 所述特异性探针是采用 eArray系统设计的。 根据本发明的实施例, 任选 地5 所迷探 |-的长度为 12mer。 关于探针, 前面已经进行了详细描述, 在此, 不再赘迷
测序单 200与文库制备单 100相连, 可以从文库制备单元 100接收所制备的样本 的基因组特定区域的高通量测序文库, 并对所接收的样本的基 ®组特定 域的高通量 *序 文库进行測序 , 从¾可以获得测序结果
数据分析单元 300与剩序单元 200相连, 可以从测序单元 200接收所获得的测序結果, 并且能够进一步对测序结杲进行数据分析, 从而基于分析结果确定样本的基因組特定区域 的甲基化信息 , 最终实现对样本的基因纽特定区域的甲基化检测》
本領域技术人员,能够理解的是, 可以采用本领域中已知的任何适于进行上迷搡作的装 置作为上述各个单元的组成部件 在本文中所使用的术语 "相连" 应作广义理解, 可以是 直接相连, 也可以通过中闽媒介闽接相连, 对于本领域的普通技术人员而言, 可以根据具 体情况理解上述术语的具体含义
利用根据本发明实施例的用于确定样本的基固组特定区域的甲基化信息的装置 > 能够 方便准确地确定样本的基因组特定区域的甲基化信息 , 从而可以应用于多种针对基因组特 定区域, 如已知甲基化位点的基因,组区域的甲基化的研究, 例如可以用于对基因组特定区 域的甲基化异常进行检测 , 试剂盒
恨据本发明的另一方面, 本发明提供了一种用于构建样本的基因組特定 城的高通量 剩序文库的试剂盒 根据本发明的实施例, 该试剂盒包括: 特异性探针, 该特异性探针是 对已知甲基化位点特异性的, 根据本发明的一些具体示例, 特异性探针是基于采用人类基 因組作为参考序列, 并且采用基因组上已知具有甲基化位点的特定基 区域作为靶序列 ¾ 设计的, 具体地, 已知具有甲基化位点的基因区域包括逸自启动子区城、 CpG岛区域、 Cp(3 岛外区域以及印记基困区域的至少一种, 由此, 利用根据本发明实旄例的特异性探针透行 杂交捕荻, 能够有效地捕获榉本中与靶序列互补的序列 即样本中已知具有甲基化位点的 基 s区域 根据本发明的实施例 可以用于设计特异性探针的已知具有甲基化位点的基 a 区域为表】中所列出基因的至少一种的编码区和启动子区 才据本发明的实旄例 所迷编码 区为 显子区 净列, 所迷启动子区为基因转录起始位点的上游 220衡 p到下游 500bjp的区 域 根据本发明的实旄例, 所述特异性探针是采用 eArray系统设计的 根据本发明的实施 例, 任选地, 所迷探针的长度为 i2me 关于探针, 前面已经进行了详细描述, 在此, 不再 赘述
本领域的技术人员可以理解, 试躬盒中还可以进一步包括构建祥本的基因组特定区域 的高通量測序文库所需的任何其他组分, 在此不再贅述 ^用根据本发明实旄例的用于抅 建样本的基因组特定区域的高通量測序文库的试剂盒, 能够方便有效地构建样本的基因组 特定区域的高通量测序文戽
需要£明的是 , 根据本发明实施倒的构建样本的基因组特定区域的高通量测序文库的 方法及其应用, 是本申请的发明人经过艰苦的创造性劳动和优化工作完成的。 下面将结合实施例对本发明的实施方案进行详细描述, 但是本領域^ ¾术人员将会理解, 下列实施例仅用于说明本发明, 而不应视为限定本发明的范围、' 实施例中未注明具体技术 或条件的, 按照本领域内的文献所描迷的技术或条件(例如参考 1萨姆布鲁克等著, 黄培堂 等译-的 《分子克隆实验指南》, 第三版, 科学出版社)或者按照产品说明书进行,. 所用试剂 或仪器未注明生产厂商者, 均为可以通过市购获得的常规产品,, 例如可以采钩自 !ita! a 公司
实施例 1:
本实施例以 2pg的人类外周血单核细胞基因组 DN A为祥本, 按照下列'步骤实施 一、 基因组 DNA片段化:
利用 covaris-S2打断仪 , 按照下表设置的参数, 将祥本基因组 DN A进行片段化处理, 以便获得 DMA片段
Figure imgf000025_0001
Figure imgf000026_0001
将获得的 DNA片段进行电泳检測, 要求 DNA片段主带集中在〗50-300 之间 , 无蛋 白 RNA污染 利用 QIAquick; PCR純化试剂盒( Qiagra )或磁珠 ft化,将检測合格的 DNA 片段纯化回溶到 32μ1的洗脱緩冲液中, 备用,
用同样的方法制备 200- 400ng的片段化的 λ -DNA , 其中 λ -DNA为外源非甲基化的
I)将上一步获得的 DNA片段按照下表在〗 .5mL的离心管中配制末端修复反应体系:
D A片段 30 ΐ.
IhO 45 μΐ
ΙΟχ多核苷酸激時緩冲液 10 μί
dNTPs (每种成分均为 10mM ) 4 μΐ,
T4 DNA聚合酶 5 μΐ
Kienow片段 1 μΐ,
T 多核苷酸激酶 5 μί
总体积 100 μ∑
2) 将上述反应体系置子 20 的 Tterawmixer ( !Sppendwi)上 , 进行 虚 30 mm , 反应完 后用 QIAquick PCR纯化试剂盒(Qiagen )进行純化,最后将纯化产物溶于 34μ .洗脱緩冲液 三、 添加碱基 Α:
1) 将上一步得到的 DNA按下表在 1 ,5 niL的离心管中配制添加域基 A的反应体系:
D A 32 μί
Η)χ Bkie緩冲液 5 ,uL
!AIP (稀释为 ItnM, G£公司) Ι Ο Ι.
Kienow (3' 5, exo-) 3 μΐ 总体积 50 μΐ
2) 将上述反应体系置于 37 C的 Thermomke Eppei lorf)上, 进行反应 30 mm, 反应完 后用 MiniE!ute K:R.純化试剂盒( Qiagen )进行純化; 最后将純化产物溶于 2{)μ】_洗脱緩冲 液,. 四. 连接甲基化接头:
1)将上一步得到的 DNA按下表配飼甲基化接头 (有时也称为 "甲基化标签接头") 的连接反应体系:
DNA Ϊ 8 μϊ,
2x apid连接緩冲液 25 μί...
甲基化标签接头 * 4 μΐ..
T4 DNA连接酶(Rapid, L603-HC-L) 3 μΐ 总体积 50 μΐ
甲基化接头序列为:
接头 i : S' Piios/G CGGAAGAGCACACGTCTGAACTCCAGTCAC
接头 2: 5' 'aCACTCTTTCCClACACGACGC X'TTCCG T'CT
或采用以下标签接头进行混合杂交( Poo ng杂交. ): 接头 2': 5 ' mC:ACTCTTTCCCTACACGAC:GC1X:TTC:CG VrC:TNNNNNNT
接头 1和接头 2或接头〗 '和接头 2'序列中的 C均进行了甲基化修饰保护
2) 将上述反应体系置于 201:的 Thermoniixer ( Eppemtori)上 - 进行:反应 15 niin , 获得 '连 接产物 反应完后用 MiniEiute PCR 纯化试剂盒(Qiager 进行純化, 最后将純化的连接 产物溶于 22μ[ 洗脫緩冲液„
五、 杂交捕获 S的片段:
1、 设-计获得特异姓探针: 通过 SSAHA算法, 设计确定一套仅由唯一序列組成的特异 性探针 具体地,以人类基因組 hg,19为参考序列,选取全基¾组已知甲基化位点的约 to,ooo 个启动予、 28,000个 CpO岛、 28,000个 CpG岛外和 61个印记基因的区域作为靶序列设计 探针 其中, 将少于 200b 的区域通过填充朴足修改为 200bp的长度, 并且去除重迭 2域, 且要求探针的序列不存在重迭序列, 所有探针要满足在允许高达 3 个插入、 炔失或错配下 仍为唯一序列的条件,并且合成的每一条 DMA探针序列上分别通过偶联生物素作为后续的 捕获标记 然后, 通过 Rocte Nirab:te( en生产获得设计好的特异性探针, 备用
表 i 显示了根据本发明一个实滬^的特异性探针对靶区域覆盖度的评估结杲 由表 1 可知, 该探针几乎覆盖了基因纽全部的启动子区域、 绝大部分的印记基 和 CpG 岛、 CGI shore区域 发明人发现, 未覆盖到的区域大多是一些存在一定重复性的短序列区域, 若将 其增加到探针可 获范围内, 不仅会增加许多非 fc区域的数据信息, 时重复序列的存在 还可能影响到对其它区域的捕获效果, 而这些区域的甲基化信息含量较少, 且不会显箸影 响甲基化的整体水平, 因此不将这些区域作为探针的 序列
2、 杂交
I) 将上一步得到的连接产物, 按下表配制杂交反应体系: 用 Qubit i uorometer和相应的 dsDNA HS检測试躬盒 (liivitrogen)对所得到的连 接产物进行定量,然后取 l g的连接产物于一个新的 j .5mL的 管中,并加入 10 tuL Img/mL 的 CJ-J DNA和各 i nmoi的接头封闭序列 5 然后置于 SpeeciVac中于 60€下蒸干 > 备用 然 后, 在蒸千的 管中分別加入 2xSC杂交緩冲液和 SC杂交組合物
cj-1 DNA 5 g
连接产物
接头封闭序列 Block 1和 Block2*
2.xSC杂交緩冲液 7.5μί...
SC杂交組合物 A 3 ttL
总体积
*接头封闭序列为:
Block [: 5' GTGACTGOAGWCAGACGTGTGCTCTTCC'G ATC
Block2: 5' A<1AFCGGAA<1AGCGTCGTGT¾GGGAAAGAGTGX
或采用以下接头封闭序列进行混合杂交:
Blockl': 5' GTGACTGGAGTTCAGACGTCTGCTCTTCCGATCTNNNNNN
B1ock2': 5\A AGA GGAAGAGCGTCGl 'rA:GGGAAAGAGTGm
其中, Block; '和 8iock2'中的碱基 NNNNN'N分^和接头 1 '和接头 2'序列中的疾基 N旦 #配对。 将上迷反应体系混匀后置于离心机上全速离心 0秒, 然后将其转移至 teai bfock中于 95 下 育 10分钟, 使 D A变姓 ,.
4 )取出上述样品, 震荡混勾后于下室温全速离心 10秒, 然后将其转移至一个 CO mL 的 PCR管中或 96孔 PCR板中, 并加入 4.5 iL前面所述的探针文库、 震荡混匀, 然后置于 离心权上全速离心 10秒, 再将该 PCR管或 6孔 PCR板置于 PCR仅上, 于 4 C下杂交 64 -72:h, 其中 K:R,仅的热盖温度设置为
3、 序列捕获
1 ) 准备洗涤緩冲液
a)将以下四种洗漆緩冲液进行标释: lOxSC洗涤緩冲液 1. 10xSC洗涤緩冲液! I、 lO^SC 洗涂緩冲液 m和 2xStriiigeiii洗涤緩冲液, 稀释到 1 χ溶液后 ώ存, 其中! &存时间不宜超过 2周
h) 于 47 'C下预热配制好的 ί ml Stringent洗涂緩冲液和 1 mL SC洗涤緩冲液 ί两种溶 液
2 )准备链霉素磁珠
a)从本箱中取出 Dynabeads M- 2 Str piavidin (invitrageft)磁珠, 充分混匀后取 Ι ΟΟμΙ 于一个新的 1.5mL的 管中; b) 将 EP 管置于磁力架上至澄清, 用移液 II小心的去除上清, 然后加入 ΙΟΟμΙ 的 Sirepiavidin Dyiiabead 结合洗涂缓冲液;
c) 将上述反应体系震荡 (Vertex) H)s, 混匀后, 将 EP管重新放回磁力架至液体澄清, 用移液器小心的去除上清;
d) 将上一步得到的反应体系进行两次洗涤;
e) 用 100μ£ 的 SircptavKfc Dynab ad 结合洗涂緩冲液将上述反应体系的磁珠进行重 悬, 并将其转入(L2 niL的小管中;
f) 用磁力架结合磁珠(将小管靠到磁力架上), 直到液体澄清, 用移液器小心地去除上 清
3 ) 利用链.霧素磁珠捕获 的片段
将杂交混合物吸出来 (记录杂交后剩余体积)加入到准备好的磁珠中, 吹打混匀 〗0次 后将小管放在 PCR仪上, 47€下孵育 45 rain (PGR仪热盖温度应设为 57Ό, 每隔〗5 ηώ, 取出震荡 3s, 以防止磁珠沉淀)
4 ) 洗涤结合了捕获 ί)ΝΑ的链霉素磁珠
a)粹育 45 min后, 将混合物从 .lmL的小管转入 1.5 ml的 EP營中, # EP管置于磁 力架上至液体澄清, 小心的去除上清;
b)加入 100 μΐ·预热到 471:的 ^洗涤緩冲液 L 震荡混勾 10s, 将 EP管置于磁力架上 至液体澄清, 小心的去除上清;
C) 从磁力架上取下 EP管, 加入: 200μί 顼热到 47r的】 x ½gCTt洗涤緩冲液 移液器 吹打混匀 10次(该操作应迅速以便管中的液体不低于 47 Ό };
d) 47X:T»育 5mi«后, 将: 管置于磁力架上至液体澄清, 小心的去除上清; e) 重复步驟 c)-d), 共用 IxSlrijigejit洗涂緩冲液洗两次;
ί)加 2W L室温放置的 _χ洗涤缓冲液 L 震荡混匀 2m¾, 如果液体減到管盖上、 用手 指轻弹 EP管使其集中到管底, 将 EP管置于磁力架上至波体澄清, 小心地去除上清;
g) 加 2«0μ. 室温放置的 .>:洗涂緩冲液 D 震荡混匀 1mm, 将 BP管置于磁力架上至液 体澄清, 小心的去除上清;
b)加 2()ί)μ ¾:温放置的 ί-χ洗涤緩冲液 m, 震荡混匀 30s, 将 EPf置于.磁力架上至液 体澄清, 小心的去除上清„
4) 洗脱结合了 13的片段的链-零素磁珠
a)在以上冼涤好的磁珠中加入 SO LSureSek^i:洗脱緩沖液 振荡 5s, 重悬磁珠; h )将上述反应体系于室温下孵育 )min5 将 EP管置于磁力架上 i(hxiin至液体澄清; c) 用移液器将上清液转移到一个新的 1.5mL 离心管中 (此时的上清液中含有捕获的 D A, 磁珠可丢弃);
d ) 向上清液中加入 5( iLSOTeSeieet中和緩冲液, 混匀;
e )用 MiniE!uie PCR純化试剂盒 ( Qiagen } 純化后 , 溶于 22μ11的洗¾緩沖液中。 六、 重亚硫酸盐处 -理:
2S 以前面制备好的 200- 400ng 片段化的 λ ΝΑ 作为夕 i、源 DNA 采用 EZ DNA Methy!atioii-Go!d Kit (ZYMO),将杂交翁荻的 的片段 DNA和外源 DNA—起进行重亚硗 酸盐共处理, 具体步驟如下:
1 )制备 CT转换试剂 ( CT Conversion Reagent )溶液:从试剂盒中馭出 CT转换试 j (面 体混合物), 分別加入 900μΙ的水、 50 L的 M-溶解緩冲液( M- Dissoivittg Buffer )和 30 iL 的 M-稀释緩冲液(M- D¾ilion Biifi¾ ), 室温下溶解并且震荡 10分钟或在摇床上摇动 10分 钟
2 ) 洗涤緩冲液的制备: 向 M-洗漆緩冲液中添加 24mL 100¾的乙醇, 备用„
3 )将待转换的 标片段 DNA与 λ -DNA混合物加入同一 FCR管中, 若不足 20μί.的 则用水朴足
4 )在 PCR管中加入 130μΙ_,的 CT转换试 溶液, 轻弹或移液器吹悬混合样品,
5 )将祥品管放到 PCR仪上按以下 ·歩骤搡作:
98 C下持续 5分钟
64 X:下持续 2.5小时
完成上迷搡作后, 立刻进行下一步搡作或者在 4 下存储 (最多 2(小时)备用》
6 ) 将 Zynio- Spin 1C™ Cotoim放入收集管 ( Collectioo Tube ) 中, 并加入 «)0μ1的 Μ-结合缓冲液( M- Binding Buffer )
7 )将重亚破酸盐处理的样品加入到含 M-结合緩冲液的 Zymo-Sphi !C Coiumn中, 盖上盖子颠倒混匀.,
8 )全速 (>1 (),«M) x g)离心 30秒, 弃收集管中的收集液。
9 )向柱中加入 lOOpL的 M-洗涤緩冲液, 全速 (>i0,000 x g)离心 30秒, 弃收集管中的 液体
10 )向柱中添加 200μΙ.的 M-Desdpkmaikm Bu¾rs 室温放置 i Sniin , 全速 (ΜΟ,ΟΟΟ x g) 离心 3()s, 畀收集管中的液体 ,.
11 }向柱中添加 2《)0μ1的 Μ-洗涤緩冲液, 全速 (>1«,0()0 x g)离心 30s、 弃收集管中的液 体, 并.再重复此步骤 i 次:,
12 )将 Zymo-Spin IC CoSimm置于新的 l ,5 L P管中, 加入 i2 L的 M-洗脱缓沖液 r到柱基盾中, 室温放置 2mi 全速 (>] 0,000 x g)离心洗脱 §的片段 D A,.
七 PO 扩增及扩增产物分离純化:
1 )将上一步得到的 g的片段 DNA按以下体系配制 PCR反应体系:
目的片段 DNA 10 μΐ
dNTP (每种成分均为 2.5mM) 4 μΙ
lOxPCR緩冲液 5 μΐ,
juHipSiari Taq DMA聚合酶 0.5 μΐ
PI公用引物 * 1 μί, 标签 N 1 μϊ- 28,5 μΐ,
总体积 50 μΐ
*其中 Pi .公用引物
'CITrc:CC-IACACGACGCl'CTTCCGATCT
标签 N的序列为: T, 其中碱基 为 、 T, C、 个碱基的任意组合、 作为区別标识
PCR.反应条件:
94 V 1 min
】8个循环
Figure imgf000031_0001
72 r Smiii
12 保持
2 )将 PCR扩增产物经 2%琼脂糖电泳后, 使用 Q Aquiek凝胶提取试剂盒 (Qiagen)回收 纯化 300-450b 片段的文库, 备用
八、 文库.检测:
使用 Bioanaty^er分析系统 (Agi M, Santa Clara, USA)检测文库插入片段的大小及舍量; 并利用 Q-K:R精确定量文库的浓度„
由此, 构建的样本的基因组特定区域的高通量 序文库经检 合格 备用 实旄例 2:
利用 ffiseq2000測序仅, 按照双末端 90个戚基的读长, 将实旄例〗构建的祥本的基因 组特定区域的高通量测序文库进行测序, 以便获得测序结杲
在上述测序后, 直接获得的是原始数据, 通过对原始数据进行基本分析可以获得上述 剩序结果 其中, 该基本分柝过程包括以下主要步踝: 首先 通过接头或 PCR引物上的序 ^标签区分不同样本的文库数据; 然后, 对测序所得的原始数据进行去污染、 去接头和去 低 量过滤; 最后,, 将经过前迷处理的数据进行碱基转化, 具体地, 将正链的 C全部转化 成 T, 互 链的 G全部转化成 A,, 由此, 获得实旄例 i.构建的祥本的基因组特定区域的高 通量剩序文库的测序结果
将获.得的測序结果进行数据分析 以便确定祥本的基因组特定区域的甲基化信息 其 中, 数据分析包括: 使用 SOAP2.01软件, 将測序结杲中的 reads与参考基 ®组进行比对, 其中容许错配率设置为 2, 以便确定唯一比对上的 rea s, 基于对这些 reads的分析, 获得祥 本的基 组特定区域的序列信息及读 域的甲基化信息 s. 本实旄例的一个方面, 以非 C|)<3处的单个的 C作为标准, 计算实旄例!的重亚砬酸盐 处理的转换效率; 以及, 基于測序结果, 进行测序深度和覆盖度的分柝 其中, 在本实旄 例中, 是对 hgl 全基 S3组所有启动子区域、 CpG岛、 CG! shore和印记基因区域的覆盖度, 及不同区域的覆盖深度的分析, 由此确定不同覆.盖 域的甲基化水平
此外, 本实施例由测序结杲确定了实; ¾例 I 中的特异性探针的捕获效率 图 2显示了 根据本发明一个实施例的方法确定基因组特定区域甲基化信息时, 在不同覆盖深度下 (覆 盖深度》 ί及覆.盖深度 > 5 ), 每条染色质上的捕获区域占探针靶 g域的百分比图 图 2所依 据的測序数据为: 測序原始数据序列为 比对率为 75,27%, 唯一比对测序序列约为 14.9M, 唯一比对率为 57,78% 由 ¾ 2可知, 在覆盖深度 > 1的条件下5 99%以上的探针均 可检測到其捕获区域的甲基化信息, 而当覆盖深度 > 5的时候,也有约 90%的探针可以检测 到其捕获区域的甲基化信息, (》1是指》〗的所有 其包含了 > 5 ) 同祥表明、 可以通过适 当的增加测序数据量来进一步提高探针捕获的实际检測范 ¾ , 这表明根据本发明实施例的 探针可以稳定可靠的捕获靶区域 ,, 结合重亚碇酸盐处理即可准确地进行甲基化检测
本实施例还分柝了检测到甲基化信息的不 元件的实验数据占基因组该区域的百分 率, 分析结果如图 3和表 2所示 基因组经过杂交捕获和重亚硤酸盐处理之后 依据以下 ¾序数据进行分析及作图, 从而得到图 3 和表 2: 测序原始数据序列为 25.5M , 比对率为 75.2.7%' 唯一比对测序序列约为 14, M > 唯一比对率为 57,78%., 图 3显示了根据本发明一 个实旄例的方法确定基因组特定区域甲基化信息时, 在不同覆盖深度下, 各条染色质中检 ^到甲基化信息的启动子占读染色盾的总启动子的百分比¾ 由图 3 可知 当覆盖深度大 于 5, 各条染色质中可检測到甲基化信息的启动子占该染色质上总启动子的百分比均大于 70%, 与理论值接近, <¾且覆盖深度大于 10时(大亍 5以上的所有, 包含了大于 10请发明 人解鋒一下, 大于 5的具体范围是多少, 因为大于 5也包舍了大于 i0 ), 各条染色质中可检 测到甲基化信息的启动子仍可占该染色质上总启动子的 60%以上 表 2显示了根据本发明 ―个实施 的方法确定基 组特定区域甲基化信息时 , 各条染色质中可检测到的印记基因 在兹染色质上的分布分析结杲。 由表 2可知, 当覆盖深度 > 1的时候, 97.6%的印记基因的 甲基化信息均可被检测到, 而保持測序数据量不变的奈件下,, 随着覆盖深度过滤的增加 , 检测到的基因个数明显降低, 这表明, 在高测序深度下分析印记基园的甲基化信息时, 应 该加大现有的测序数椐量, 提高每一个印记基因的覆盖深度
另外, 本实施例还分析了基因组上启动子区域、 Cp J岛, CGi shore及印记基因区域的 甲基化水平分布, 分析结果如图 4所示。
表 ί 设计探针在全基因组各靶区域的覆盖信息 靶区域名称 靶区域量 耙区域覆盖量 探针覆盖率 (%) 启动子 10018 9449 94.32
印记基因 61 41 67.21
CpG岛 27623 Π 90 43.41 CpG鳥外 27628 11076 4( 09 表 2 检测到的印记基 在每一条染色质上的分布分析 染色质. 每条染色膚印记基 检測到的印记基因个数 检测到的印记基因个数 总数 ( >| X ) f > .) c rl 2 2 0
c r4 1 I 0
chr6 4 4 0
chrl 12 12
c r8 2 0
chr 1 Ϊ {
chrlO 1 I 0
chrl 1 1 1 1 1 4
chrl 2 1 i I
chrl 4 2 1
chrl 5 15 13 0
chrl 6 1 1 0
chrl 8 1 ί i
chrl 9 2 0
c r20 5 5 3
总数统计 61 59 12
其中、 图 4(a)显示了根据本发明一个实施例的确定的样本的基因组 CpG岛、 CCS1 shore 区域的甲基化水平分布图。 由图 4(a)可知, 高 CG含量的 CpG岛处于低甲基化氷平; ¾ CGI shore区域的甲基化水平, 相对于 Q>G岛甲基化水平显著增高。 ¾ 4 (b)显示了很据本发明 一个实族例的确定的祥本的基因组启动子区域的甲基化水平分布图 由图 4(b)可知 启动子 区域中, 其转录起始位点处的甲基化水半处于低水平; 所有结杲与理论相符 图 4(c)显示了 样本的基因组特定区域的原始分布和裉据本发明一个实旄例的样本的基因组特定区域的高 通量测序文库的 reacfe分布及确定的 动子、 CpG岛区域的甲基化水平分布 IK 由图 4(c)可 知, 根据本发明实拖例的确定祥本的基因组特定区域的甲基化信息的方法, 能有效地捕获 每一个特定区域, 并能够准确检 ¾ [到该区域的甲基化信息 实施例 3:
采用炎黄细孢系样品( Jira Wang ei al. 2008 ), 重复实施例〗 , 只是, 用于设计特异性探 针的已知具有甲基化位点的基因区域为表 中所 出基 的编码区和启动子区(合并重复基 因后共 867个基因), 采用 eArray系统设计, 由安捷伦公司制备的, 探针的长度为 .2raer 另外, 对于重測序和非甲基化测序文库不需要进行重: 硫酸盐处理步艨
采用混合标签测序, 读段长度 49bp, 标签长度 6bp, 下机序列片段数量 2.67Mb对 测 试数据量产出约 240M 使用 bwa比对程序, 将过滤了低盾量和污染接头的测序片段比对 到人类全基 组上, 并对比对结果做了初步的分析 ,
检測結果:
表 3 给出了炎黄细應系祥品具体的下机的数据总量、 通过过滤和獰选之后得到的数 t 量, 以及通过比对之后最终能够比对到人类基因组上的序列总数, 以及比对率和芯片捕获 效率等
表 3数据产出以及比对基本情况统计
统计奈 a 水平
标区域城基数 (sv¾) 3.413
原始下机序列数 (n) 5520814
原始数据产量 (bp) 231874188
过滤后序列数 (n) 5376398
可用的数据产量 (bp) 225777680
平均序列片段长度 (bp) 4.1.99
质量值大于 20的碱基比 (¾) 99.31
比对到基.因,组的序列数 (11) 5283168
比对率 (%) 98.68
唯一比对的序 ^数 (Ώ) 4762261
唯一比对率(%) 88.88
比对到目标区域内的序列数, (n) 2480823
捕获效率 (%) 52.09
平均深度. 28.9387
覆盖度(¾》 ss£ 1 X 98.67
覆盖度(%) > 1 (« 8.1.26
覆盖度(%.) >=30X 39.75
序列重复率 ί%) 0.34
表 4 给出了各染色体和基因元件统计的 标区域的深度和覆盖度, 从整体上看 所捕 获到的数据在各个染色体的覆盖情况不存在差別。
表 4 比对序列在各个染色体上的分布情况
外显子 启动子
染色体 覆盖度
平均深度 平均深度
>^ιχ =ΪΘΧ >^ιχ >^10Χ e rl 32.5643 99.1 1 85.5 31.6014 99 84.42 chr2 28.6142 98.73 29.2488 99.21 84,77 c r3 30. Ϊ 98,94 84.73 28.2216 99,2 82.6 c ir4 28,2252 98,77 82.62 28.186 99, 13 83.39 c rS 30.552 98.54 84,1 30.092 99.32 86,92 chr( 33,0796 98.29 85.21 34.0625 99.54 85.72
。'
chr7 27.0121 98.48 80,3 30,5068 98.56 84,78 chrS 31.0447 99,74 88.57 98,62 76.71 c ir9 29,0676 99,02 82.61 98,25 75.31 chri O 27.8786 99.16 82,83 27,7149 98.6 80,43 cbrll 29.45 99, 13 82.79 98,92 82.33 chrl 2 29,4246 98.67 82.26 32.291 ) 98.84 83:73 c ri3 23.7642 97.54 74,79 345168 98.73 86,93 chrl4 29.7016 99.67 82.25 30.5752 85.76 c ri S 29.6648 99.01 80,93 29,7592 99.66 84,85 chrl 28,2079 98,26 79.6! 29,2325 98,36 83.27
chri 7 98.63 86.27 30,4981 98.48 82,02 ehr!8 25.0695 98.73 75.49 34.6129 98.26 85.97 chrl 9 26.088 97.57 ?2.98 28.1044 95.73 ?2.88 chr20 30.171 98.08 81 ,9 30.3635 98.83 84,5.1 clirll 23,7753 94.16 7336 26.8191 98.47 75.99 chr22 30.6012 98.26 81 ,74 27.5009 74.69 ehrX 16.93? 98.66 67.18 14.918 98.79 62.83 chrY 34.6212 100 97.14 21.3816 S>9.78 ?5.94 图 6 示出了测序下机数据插入片段长度的分布, 从图中可以看出, 虽然未经过片段逸 择, 但插入片段大小在 0bp左右 围 7示出了目标区域各个碱基的测序深度的分布情况, 从 S中可以看出, 大部分碱基(约 75¾ )的覆盖深度在 20X以上, 如果继续加大测序量, 則能保 »剩序覆盖深度的一个要求。 图 8 展示了在数据比对过程中, 所有比对上的 Reads 各个碱基位置发生错配的频率, 根据测序原理,, 測序质量随读长的增加而降低, 測序錯误 率随读长的增加西增加 从图 8中也可以看出, 末端 Reads的错配数比较多, 园此在后续变 异检 中 应考虑末端測序质量问题 图 9展示了目标基 ¾的测序覆盖情况, 从图 9中可 以看出, 深度 i0:X以上, 覆盖度达到 60%以上的基因达到了 80%, 说明探针对基 a的捕获 无误, 继续加大測序量, 别可满足在一定深 上, 100(½覆盖所有的涉及到的基因, ¾ 10 展示了, 探针芯片对组蛋白基因 MST2H3A基因及其启动子的捕获情况
至此, 本实施例说明了剩用芯片摘获表观基 S组基 ¾启动子区和外显子区域的可行性, 可以用于后续的变异检测分析和胞嘧啶甲基化检測的分析 为能提高检测的准确姓, 建议 提高上.机祥品量, 增加测序深度 工业实用性
本发明的通量测序文库的构建方法及其应用, 能够方便有效地应用子祥本的基因组特 定区域的高通量測序文库的构建以及测序, 进而, 能够有效地用于后续的变异检测分析和 胞嘧啶甲基化检測的分析, 并且获得的文專质量好, 测序及.分析結果准确 尽管本发明的具体实旄方式已经得到详细的描迷, 本領域技术人员将会理解 根据已 经公开的所有教导, 可以对那些细节进行各转修改和替换 这些改变均在本发明的保护范 围之内 本发明的全部范围由所附权利要求及其任何等同物给出
在本说明书的描述中, 参考术语 "一个实施例 "、 "一些实施例"、 "示意性实施例"、 "示 例", "具体示例"、 或 "一些示例" 等的描述意指结合该实施例或示例描述的具体特征、 结 构、 材料或者特点包含于本发明的至少一个实施例或示例中 在本说明书中 对上述术语 的示意性表述不一定指的是相同的实旄例或示例 且, 描迷的具体特征、 结构、 材料或 者特点可以在任何的一个或多个实.途例或示例中以合适的方式结合 ·。

Claims

权利要求书
1 , 一种构建高通量測序文库的方法, 其特征在于, 包括以下步樣.:
将基因组 DNA片段化, 以便获得 D A片段;
将所述 DNA片段进行末¾修复, 以便获得经过末竑修复的 DNA片段;,
在所述经过末¾修复的 DNA片段的 3,末端添加碱基 A, 以便获得具有粘性末端 A的 DMA片段;
将所述真有粘性末端 A的 DNA片段与甲基化接头相连, 以便获得连接产物; 剩用特异性探针对所述连接产物进行杂交捕荻 以便获得目的片段;
将所述 g的片段进行 PCR扩增, 以便获得扩增产物; 以及
分离純化所述扩增产物 所迷 4广增产物构成所迷高通量测序文库
2, 根椐权利要求 1所述的方法, 其特扭在于 5 在进行杂交捕荻之后, 在进行 PCR 增 之前, 将所述目的片段进行重亚减酸盐处理, 以便将所述目的片段中非甲基化的胞嘧啶转 换为尿嘧啶;
3 , 根据权 ^要求 所迷的方法, 其特征在于, 进一步包括从样本中提取基因组 DMA 的步驟
4, 根据权利要求 2所述的方法, 其特征在于, 所述样本来源于哺乳动物 > 植物, 和微 生物的至少一种。
5, 根椐权到要求 4所述的方法, 其特征在于, 所述哺乳动物为人和小鼠的至少一种 6,,根据权利要求 5所迷的方法,其特征在于,所迷基 組 DNA为人类全血基 组 ί)ΝΑ
7. 根据权利要求 6所述的方法, 其特征在于, 所述基 ®组 DNA为外周血单核细皰基
8、 根据权利要求 I所述的方法, 其特征在于 5 所迷基因组 DNA的量为 2μ§:
9, 根据权利要求 1 所述的方法, 其特征在于, 利用 ccn ris S2打断仪将基 组 DNA 片段化;
10、 根据权利要求 1所述的方法, 其特征在于, 所述 D A片段的长度为约〖50- 30(¾p
11 ,很据权利要求 10所述的方法,其特征在于,所述. DNA片段的长度为约 200- 3(K¾jp; !2、 根据权利要求 1所迷的方法, 其特征在于, 在将所述: DNA片段进行末端修复前, 进一步包括纯化 A片段的步 13、 根据权利要求 12所述的方法 其特扭在子, 将所述 DMA片段进行末端修复是利 用 Klenow片段 > T4 DNA聚-合酶和 T4多核苷酸激酶进行的, 其中; 所述 Ktenow片段具有 5、 3'聚合酶活性和 3' 5、聚合錄活' I生,, 但缺少 5' 3'外切酶活性;
14、根据权利要求 3所述的方法,其特征在于,将所述经过末端修复的 DNA片段的 3' 東端添加碱基 A是利用 I Ienow (3!-5' exo-)进行的 ,.
15、 根据权利要求 1所述的方法, 其特征在于, 所述甲基化接头中包含标签
½、根据权利要求〗所述的方法 5 其特征在于, 将所迷具有粘性末端 A的 ϋΝΑ片段与 甲基化接头相连前 5 进一步包括对接头进行甲基化的步骤 ,
17、根椐权利要求〗所述的方法, 其特征在亍, 将所迷具有粘性末 ¾八的 0NA片段与 甲基化接头相连是利用 Τ4 D A连接酶进行的 :
18. 根据权利要求 1. 所述的方法, 其特征在于, 在获得连接产物后, 进一步包括对连 接产物进行纯化的步璨
19, 根据权剁要求 1. 所述的方法, 其特征在于 所述特异性探针是对已知甲基化位点 特异性的
20, 根据权利要求 19所述的方法, 其特征在于, 所述特异性! 是基于采用人类基因 组作为参考序列, 并且采用已知具有甲基化位点的基 ®区域作为靶序列而设计的 .
21 , 根据权利要求 20所述的方法, 其特征在于, 所述已知具有甲基化位点的基因 域 包括逸自启动子区域、 Cp(3岛区域、 CpG岛外 S域以及印记基因区域的至少一种
22. 根据权 ^要求 20所述的方法, 其特征在于, 所述已知具有甲基.化位点的基因区域 为表 1中所列出基因的至少一种的编码区和启动子区。
23 , 根据权刹要求 22所述的方法, 其特征在于, 所述编码区为外显子区域序列, 所迷 启动子区为基 转录起始位点的上游 2200bf 到下游 5(Κ¾ρ的区域;
24、 根据权利要求 23所迷的方法, 其特征在于, 所迷特异性探针是采用 eArray系统设 计的 s
25、 根据权利要求 24所迷的方法, 其特粗在子 任逸地, 所迷探针的长度为 12mar,
26、 根据权利要求 1 所迷的方法, 其特征在于, 在所述杂交捕获前 s 进一步包括利用 d DNA和接头封闭序列分別对所述连接产物和所述连接产物上的甲基化接头进行杂交封 闭的步骤
1Ί、 根据权利要求 26所迷的方法, 其特征在于, 采用过量的 c - / DNA对所述连接产 物进行杂交封闭,: 28、 根据权利要求 27 所述的方法, 其特征在于, 所述接头封闭序列包括选自 Blocki 和 Btoc 的至少一种,
29、 根据权利要求 1所述的方法, 其特征在于 采用 Ι μ 的连接产物进行所述杂交捕 歡。
30、 根据权利要求 1 所述的方法, 其特征在于, 利用特异性探针对所述连接产物进行 杂交捕获进一步包括利用链霉素磁珠捕获所述 Θ的片段
31. 根据权利要求 2 所述的方法, 其特征在于, 在将所迷目的片段进行重亚碇酸盐处 理之前, 进一步包括将所迷 的片段与片段化的 λ -DNA混合
32、根据权利要求 31所述的方法,其特征在于 所迷片段化的 λ -DNA的量为 200- 4«)ftg 33、 根据权利要求 32所述的方法, 其特征在于, 所述片段化的 λ— DNA的量为 200«g
34、 根据权利要求 2 所述的方法, 其特征在于, 将所述 ϋ的片段进行重 £¾酸盐处理 是采用 EZ DNA Meihylation-Ciokl Kit™ ( ZYMO )进行的。
35、 根据权剁要求】所述的方法, 其特征在亍, 使用热启动 taq DNA聚合酶进行所述 PCR扩增
36, 根据权利要求 1 所述的方法, 其特征在于, 分离純化所述扩增产物是通过选自磁 珠純化、 纯化柱純化和 2%的琼腐糖凝胶电泳的至少一种进行的.,
37, 根据权利要求 i所述的方法, 其特征在于, 分离純化所述扩增产物是通过 2%的琼 脂糖凝狡电泳进行的
38. 根据权利要求 1 所述的方法, 其特征在于, 所述高通量测序文库的文库片段长度 为 300- 45()bjp,.
39 , 一种确定样本的基 Θ组特定区域的甲基化信息的方法 s 其特征在于, 包括下列-歩 :
根据权利要求 1 -38任一項所迷的方法构建所述祥本的基因组特定区域的高通量测序文 库;
对所迷祥本的基因组特定区域的高通量测序文库进行測序 以便得到测序结果; 以及 对所迷測序结果进行数据分柝, 以便确定所迷祥本的基 S组特定区域的甲基化信息
40、 根据权利要求 39所迷的方法, 其特征在于 5 所述.测序是利用高通量测序技术进行 的
41、 根据权利要求 39所述的方法, 其特征在于 所述测序是利用 Hisei|20C )测序仪进 行的 :
42, 一种用于确定样本的基因纽特定区域的甲基化信息的装置, 其特征在于, 包括:
3S 文戽制备单元 所迷丈库制备单元用于制备祥本的基因组特定区域的高通量測序文庠, 所述文库制备单元内设置有特异性探针;
測序单元, 所述測序单元与所迷文库制备单元相连, 并且从所述文库制备单元 ¾收所 述样本的基因組特定区域的高通量測序文库, 以便.用于对所述样本的基因组特定区域的高 通量測序文库进行测序, 获得测序结果; 以及
数据分析举元, 所述数据分析举元与所述測序单元相连, 并且从所述測序单元 ¾收所 述測序結果, 以便对所述測序结果进行数据分析, 确定所述样本的基 ®组特定区域的甲基 化信息
43、 根据权利要求 5 所述的装置, 其特征在于 所述特异性探 |·是对已知甲基化位点 特异性的:
44、 根据权 ^要求 43所述的装置, 其特征在于 所述特异性探针是基于采用人类基因 组作为参考序列, 并且采用已知具有甲基化位点的基因区域作为靶序列 > 设计的
45、 根据权利要求 44所述的装置, 其特征在于, 所迷已知具有甲基化位点的基因区域 包括逸自启动子 g域、 CpG岛区域、 CpG岛外区域以及印记基因区域的至少一种,
46 , 根据权利要求 44所述的装置, 其特征在于, 所述已知具有甲基化位点的基因 li域 为表 ί中所列出基 S的至少一种的编码区和启动子区
47 , 根据权利要求 46所述的装置,, 其特征在于, 所述编码区为外显子区域序列, 所迷 启动子区为基因转录起始位点的上游 220( ρ到下游 500bp的区域
48 ,根据权利要求 47 .所述的装置, 其特征在于, 所述特异性探针是采用 eAr y系统设 计的。
49 , 根据权利要求 48所述的装置, 其特征在于, 所迷探针的长度为 12mei;
50 , —种用于构建祥本的基 组特定区域的高通量测序文库的试剖盒, 其特征在于, 包括:
特异性探针, 所迷特异性探针是对已知甲基化位点特异性的
5 根据权利要求 50所迷的试^盒, 其特粗在于 所迷特异性探针是基于采用人类基 因组伟为参考序列, 并且采用已知具有甲基化位点的基 ®区域作为耙序列而设计的.
52、 根据权利要求 51所迷的试剂盒, 其特粗在于, 所迷已知具有甲基化位点的基园区 域包括选自启动子区域、 CpG岛区域,, CpG岛外区域以及印记基因区域的至少一种
53、 根椐权利要求 52所迷的试剂盒, 其特扭在于, 所迷已知具有甲基化位点的基¾区 域为表 ί中所列出基因的至少一种的编码区和启动子区。 54、 根据权利要求 53所述.的试躬盒, 其特征在于, 所述编码 K为外显子区域序列 所 述启动子区为基因转录起始位点的上游 2200bp到下游 500l)p的区域
55、 根据权 ^要求 54所迷的试剂盒, 其特狃在于 所迷特异性探针是采用 eArray系统 设计的
56、 根据权利要求 55所述的试剂盒, 其特征在于, 所迷探针的长度为 }2me
PCT/CN2012/084691 2011-11-15 2012-11-15 高通量测序文库的构建方法及其应用 WO2013071876A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/358,674 US9920363B2 (en) 2011-11-15 2012-11-15 Constructing method of high-throughput sequencing library and use thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110362032.2A CN103103624B (zh) 2011-11-15 2011-11-15 高通量测序文库的构建方法及其应用
CN201110362032.2 2011-11-15

Publications (1)

Publication Number Publication Date
WO2013071876A1 true WO2013071876A1 (zh) 2013-05-23

Family

ID=48311805

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/084691 WO2013071876A1 (zh) 2011-11-15 2012-11-15 高通量测序文库的构建方法及其应用

Country Status (3)

Country Link
US (1) US9920363B2 (zh)
CN (1) CN103103624B (zh)
WO (1) WO2013071876A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110195095A (zh) * 2018-02-27 2019-09-03 上海鲸舟基因科技有限公司 一种新的基因组甲基化文库的构建方法和应用

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104250663B (zh) 2013-06-27 2017-09-15 北京大学 甲基化CpG岛的高通量测序检测方法
WO2015042980A1 (zh) * 2013-09-30 2015-04-02 深圳华大基因科技有限公司 确定染色体预定区域中snp信息的方法、系统和计算机可读介质
CN103555856B (zh) * 2013-11-15 2015-02-11 复旦大学 一种全基因组dna甲基化的导向测序技术
EP3192869B1 (en) 2014-09-12 2019-03-27 MGI Tech Co., Ltd. Isolated oligonucleotide and use thereof in nucleic acid sequencing
CN104293938B (zh) * 2014-09-30 2017-11-03 天津华大基因科技有限公司 构建测序文库的方法及其应用
CN104294371B (zh) * 2014-09-30 2017-07-04 天津华大基因科技有限公司 构建测序文库的方法及其应用
CN104264231B (zh) * 2014-09-30 2017-04-19 天津华大基因科技有限公司 构建测序文库的方法及其应用
CN105603052B (zh) * 2014-11-11 2021-03-19 武汉华大医学检验所有限公司 探针及其用途
WO2016082057A1 (zh) * 2014-11-25 2016-06-02 深圳华大基因研究院 游离dna测序文库的构建方法
CN107002292B (zh) * 2014-11-26 2019-03-26 深圳华大智造科技有限公司 一种核酸的双接头单链环状文库的构建方法和试剂
EP3225721B1 (en) * 2014-11-26 2019-07-24 MGI Tech Co., Ltd. Method and reagent for constructing nucleic acid double-linker single-strand cyclical library
CN104532360B (zh) * 2014-12-17 2017-02-22 北京诺禾致源科技股份有限公司 全基因组甲基化测序文库及其构建方法
CN104561362B (zh) * 2015-02-03 2017-06-06 北京诺禾致源科技股份有限公司 高通量测序文库及其构建方法
CN104818336A (zh) * 2015-05-13 2015-08-05 广州燃石医学检验所有限公司 一种基于多探针富集56基因靶区域的方法
CN104894651B (zh) * 2015-06-29 2017-04-12 天津诺禾医学检验所有限公司 微量起始dna的高通量测序文库构建方法及其所构建的高通量测序文库
CN115044645A (zh) * 2015-11-11 2022-09-13 分析生物科学有限公司 Dna文库的高效率构建
CN105907748B (zh) * 2016-05-10 2017-10-13 广州嘉检医学检测有限公司 一种基于高通量测序的线粒体基因组文库及其构建方法
CN105925562A (zh) * 2016-05-10 2016-09-07 广州嘉检医学检测有限公司 一种富集4000人类致病靶基因的方法及试剂盒
RU2019108294A (ru) 2016-08-25 2020-09-25 Резолюшн Байосайенс, Инк. Способы обнаружения изменений количества геномных копий в образцах днк
CN106754309B (zh) * 2016-12-21 2019-04-26 山东艾克韦生物技术有限公司 应用于第二代高通量测序的全自动rna文库制备装置
CN106701531B (zh) * 2016-12-21 2019-02-01 山东艾克韦生物技术有限公司 应用于第二代高通量测序的全自动dna文库制备装置的测序方法
CN107460238A (zh) * 2017-07-07 2017-12-12 沈阳宁沪科技有限公司 一种无创高通量甲基化前列腺癌诊断、研究和治疗方法
US11725305B2 (en) * 2017-07-17 2023-08-15 SeqOnce Biosciences, Inc. Rapid library construction for high throughput sequencing
CN108251504A (zh) * 2018-01-17 2018-07-06 翌圣生物科技(上海)有限公司 一种超快速构建基因组dna测序文库的方法和试剂盒
CN110468179B (zh) * 2018-05-10 2021-03-05 北京大学 选择性扩增核酸序列的方法
CN108753954B (zh) * 2018-06-26 2022-11-18 中南大学湘雅医院 痴呆相关基因的捕获探针组、试剂盒、文库构建方法和用途
CN109023537B (zh) * 2018-09-04 2021-10-08 上海交通大学 一种微量dna样品高通量测序文库的构建技术
CA3111887A1 (en) 2018-09-27 2020-04-02 Grail, Inc. Methylation markers and targeted methylation probe panel
CN109536579B (zh) * 2018-11-05 2022-04-22 深圳市艾斯基因科技有限公司 单链测序文库的构建方法及其应用
CN109609613B (zh) * 2019-01-25 2022-03-11 艾吉泰康生物科技(北京)有限公司 一种dna羟甲基化目标区域捕获测序方法
AU2020345621A1 (en) * 2019-09-09 2022-04-07 University Of Utah Research Foundation Targeted sequencing to detect and quantify low levels of methylated DNA
CN112522418A (zh) * 2019-09-19 2021-03-19 公安部物证鉴定中心 一种用于检测人类基因组dna的遗传标记物
CA3147326A1 (en) * 2019-09-30 2021-04-08 Ushati DAS CHAKRAVARTY Methods of preparing dual indexed methyl-seq libraries
CN111041069B (zh) * 2019-12-26 2021-01-19 人和未来生物科技(长沙)有限公司 一种低起始量dna样本的高通量测序文库构建方法及其应用
CN112662760A (zh) * 2020-02-25 2021-04-16 博尔诚(北京)科技有限公司 一种癌症基因甲基化检测系统和在该系统在中执行的癌症体外检测方法
CN111627499B (zh) * 2020-05-27 2020-12-08 广州市基准医疗有限责任公司 甲基化水平的向量化表征、特定测序区间检测方法和装置
CN112251506A (zh) * 2020-07-23 2021-01-22 中国辐射防护研究院 一种基于Taqman探针法的UIMC1基因突变位点检测试剂盒及其用途
CN112301115B (zh) * 2020-09-22 2022-12-09 厦门艾德生物医药科技股份有限公司 一种基于高通量测序的FGFRs基因突变的检测方法及探针序列
CN112779320B (zh) * 2020-12-04 2023-07-14 深圳市易基因科技有限公司 多区域dna甲基化检测探针设计及其检测方法
CN113584600A (zh) * 2021-08-11 2021-11-02 翌圣生物科技(上海)股份有限公司 一种全基因组甲基化单链dna建库方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008096146A1 (en) * 2007-02-07 2008-08-14 Solexa Limited Preparation of templates for methylation analysis
CN101802223A (zh) * 2007-08-15 2010-08-11 香港大学 用于高通量亚硫酸氢盐dna-测序的方法和组合物及其用途

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5198284B2 (ja) * 2005-12-22 2013-05-15 キージーン ナムローゼ フェンノートシャップ 高処理量配列決定技術を使用する転写産物の特徴づけのための改良された戦略
US20110318738A1 (en) * 2008-12-23 2011-12-29 University Of Utah Research Foundation Identification and regulation of a novel dna demethylase system
WO2012058634A2 (en) * 2010-10-28 2012-05-03 Salk Institute For Biological Studies Epigenomic induced pluripotent stem cell signatures

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008096146A1 (en) * 2007-02-07 2008-08-14 Solexa Limited Preparation of templates for methylation analysis
CN101802223A (zh) * 2007-08-15 2010-08-11 香港大学 用于高通量亚硫酸氢盐dna-测序的方法和组合物及其用途

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TEER, J.K. ET AL.: "Systematic Comparison of Three Genomic Enrichment Methods for Massively Parallel DNA Sequencing", GENOME RESEARCH, vol. 20, 2010, pages 1420 - 1431, XP055074121, DOI: doi:10.1101/gr.106716.110 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110195095A (zh) * 2018-02-27 2019-09-03 上海鲸舟基因科技有限公司 一种新的基因组甲基化文库的构建方法和应用

Also Published As

Publication number Publication date
US20140329697A1 (en) 2014-11-06
CN103103624B (zh) 2014-12-31
CN103103624A (zh) 2013-05-15
US9920363B2 (en) 2018-03-20

Similar Documents

Publication Publication Date Title
WO2013071876A1 (zh) 高通量测序文库的构建方法及其应用
EP3377625B1 (en) Method for controlled dna fragmentation
Zhang et al. Strand-specific libraries for high throughput RNA sequencing (RNA-Seq) prepared without poly (A) selection
JP6679576B2 (ja) 小胞性リンカー、ならびに核酸ライブラリ構築およびシーケンシングにおけるその使用
JP6324962B2 (ja) 標的rna枯渇化組成物を調製するための方法およびキット
CN106591441B (zh) 基于全基因捕获测序的α和/或β-地中海贫血突变的检测探针、方法、芯片及应用
Lee et al. Improved reduced representation bisulfite sequencing for epigenomic profiling of clinical samples
CN113661249A (zh) 用于分离无细胞dna的组合物和方法
JP7379418B2 (ja) 腫瘍のディープシークエンシングプロファイリング
Fox-Walsh et al. A multiplex RNA-seq strategy to profile poly (A+) RNA: application to analysis of transcription response and 3′ end formation
WO2012037878A1 (zh) 核酸标签及其应用
JP6925424B2 (ja) 短いdna断片を連結することによる一分子シーケンスのスループットを増加する方法
WO2013075629A1 (zh) 一种检测核酸羟甲基化修饰的方法及其应用
WO2012116661A1 (zh) Dna标签及其应用
WO2008045575A2 (en) Sequencing method
WO2020165433A1 (en) Haplotagging - haplotype phasing and single-tube combinatorial barcoding of nucleic acid molecules using bead-immobilized tn5 transposase
WO2020219759A1 (en) Methods and compositions for enrichment of target nucleic acids
WO2013075313A1 (zh) 一种检测病毒在待测样本中整合方式的探针及其制备方法和应用
Chen et al. BisQC: an operational pipeline for multiplexed bisulfite sequencing
CN113166809A (zh) 一种dna甲基化检测的方法、试剂盒、装置和应用
EP3950956A1 (en) Method and system for constructing sequencing library on the basis of methylated dna target region, and use thereof
US20240084291A1 (en) Methods and compositions for sequencing library preparation
WO2014086037A1 (zh) 构建核酸测序文库的方法及其应用
CN105603052B (zh) 探针及其用途
CN114787385A (zh) 用于检测核酸修饰的方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12850628

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 14358674

Country of ref document: US

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07/11/2014)

122 Ep: pct application non-entry in european phase

Ref document number: 12850628

Country of ref document: EP

Kind code of ref document: A1