EP3122879A1 - Procédé de préparation d'acides nucléiques - Google Patents

Procédé de préparation d'acides nucléiques

Info

Publication number
EP3122879A1
EP3122879A1 EP15713006.3A EP15713006A EP3122879A1 EP 3122879 A1 EP3122879 A1 EP 3122879A1 EP 15713006 A EP15713006 A EP 15713006A EP 3122879 A1 EP3122879 A1 EP 3122879A1
Authority
EP
European Patent Office
Prior art keywords
nucleic acids
adaptor
nucleic acid
double
population
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP15713006.3A
Other languages
German (de)
English (en)
Inventor
Shankar Balasubramanian
Eun-Ang RAIBER
Gordon MCINROY
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambridge Enterprise Ltd
Original Assignee
Cambridge Enterprise Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambridge Enterprise Ltd filed Critical Cambridge Enterprise Ltd
Publication of EP3122879A1 publication Critical patent/EP3122879A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • This invention relates to the preparation of nucleic acids, for example bisulfite-treated nucleic acids, for the analysis of modified cytosine marks.
  • 5-methylcytosine formed by methylation at the C5 position of the DNA base cytosine, is an important epigenetic mark. 5mC has been shown to regulate gene expression (Deaton, A. M. ; Bird, A. Genes & development 2011, 25, 1010-22) and is involved in a plethora of important processes including X-chromosome inactivation (Jones, P. A.; Takai, D. Science (New York, N.Y.) 2001, 293, 1068-70), genomic imprinting and cancer progression (Jones, P. A. Oncogene 2002, 21, 5358-60) .
  • cytosine include 5-hydroxymethylcytosine (5hmC) and 5-formylcytosine (5fC) .
  • 5hmC 5-hydroxymethylcytosine
  • 5fC 5-formylcytosine
  • This positional information can be attained by a variety of methods, including bisulfite sequencing (BS-seq) and variations of bisulfite sequencing, such as oxidative bisulfite sequencing (OxBS- seq; Booth et al (2013) Nat Protoc 8 1841-1851; Booth et al (2012) Science 336 934-937), reductive bisulfite sequencing (redBS-seq; Nature Chemistry 6 , 435-440 (2014) .
  • BS-seq bisulfite sequencing
  • OFS- seq oxidative bisulfite sequencing
  • redBS-seq reductive bisulfite sequencing
  • a key step in bisulfite sequencing involves the chemical
  • compositions reduces the coverage obtained during sequencing, i.e. read depth is lost in those areas. This leads to an additional and often overlooked problem: loss of quantitative power.
  • Bisulfite sequencing is theoretically able to give the percentage methylation at single base resolution. This is achieved by exploiting the depth of coverage and digital readout that next generation sequencing (NGS) techniques give for a specific base. If 18 out of 30 reads covering a base indicates 5mC at that position, the location is saic to be 60% methylated. However, after bisulfite treatment, the composition of reads changes to a greater or lesser extent,
  • the present inventors have developed a process that provides improved yields of nucleic acids, for example bisulfite-treated nucleic acids, that carry adaptors on both ends and are suitable for sequencing .
  • fragments and the associated information can be recovered by employing a two-step ligation procedure, where a first adapter is added before bisulfite treatment and a second adapter afterwards .
  • An aspect of the invention provides a method of preparing a nucleic acid library comprising;
  • Each double-stranded nucleic acid in the library may comprise a target nucleic acid from the population with the first adaptor at a first end and the second adaptor at a second end.
  • the nucleic acids with the 3' adapto equence may be treated with bisulfite.
  • the double-stranded nucleic acids in the library may be denatured to produce a library of nucleic acid strands having a first adaptor sequence at a first end and a second adaptor sequence at a second end.
  • Each nucleic acid strand in the library may comprise the sequence of a target nucleic acid from the population with the first adaptor sequence at a first end and the second adaptor sequence at a second end.
  • nucleic acids in the nucleic acid library may be interrogated, for example to determine the identity of one or more bases in the target nucleic acid sequence.
  • Suitable methods of interrogation include sequencing or hybridisation, for example to a probe, e.g. a probe immobilised on an array.
  • a method may comprise sequencing nucleic acids in the nucleic acid library following preparation as described above.
  • a nucleic acid library is a diverse collection of single or double stranded target nucleic acids.
  • the nucleic acids have adapted ends i.e. the nucleic acids in the library have an adaptor at each end.
  • the presence of terminal adaptors at the ends of the target nucleic acid sequence allows the nucleic acids in the library to be sequenced.
  • all or substantially all of the nucleic acids in a library are sequenceable (i.e. the nucleic acids in the library each comprise adaptors at both ends) following production as described herein.
  • Nucleic acids may be ribonucleic acids (RNA) or more preferably deoxyribonucleic acids (DNA) .
  • the population of nucleic acids may be DNA molecules.
  • RNA molecules in the population may comprise all or part of the sequence of one or more genes, including exons, introns or upstream or downstream regulatory elements, and/or the sequences may comprise a genomic sequence that is not associated with a gene.
  • a population of double-stranded DNA molecules may represent the whole genome or a specific genomic locus of an organism or a population or organisms.
  • the population of nucleic acids may comprise one or more CpG islands, GC-rich regions (GC content > 60%) and/or AT rich regions (AT content > 60%) .
  • the population of target nucleic acids may be obtained or isolated from a sample of cells, for example, mammalian cells, preferably human cells. Suitable samples include isolated cells and tissue samples, such as biopsies.
  • Suitable cells include somatic and germ-line cells and may be at any stage of development, including fully or partially differentiated cells or non-differentiated or pluripotent cells, including stem cells, such as adult or somatic stem cells, foetal stem cells or embryonic stem cells. Suitable cells also include induced
  • pluripotent stem cells which may be derived from any type of somatic cell in accordance with standard techniques.
  • target nucleic acids may be obtained or isolated from neural cells, including neurons and glial cells, contractile muscle cells, smooth muscle cells, liver cells, hormone synthesising cells, sebaceous cells, pancreatic islet cells, adrenal cortex cells, fibroblasts, keratinocytes , endothelial and urothelial cells, osteocytes, and chondrocytes.
  • neural cells including neurons and glial cells, contractile muscle cells, smooth muscle cells, liver cells, hormone synthesising cells, sebaceous cells, pancreatic islet cells, adrenal cortex cells, fibroblasts, keratinocytes , endothelial and urothelial cells, osteocytes, and chondrocytes.
  • Suitable cells also include cells associated with disease
  • cancer cells such as carcinoma , sarcoma, lymphoma, blastoma or germ-line tumour cells, and cells with the genotype of a genetic disorder, such as Huntington's disease, cystic fibrosis, sickle cell disease, phenylketonuria, Down syndrome or Marfan syndrome.
  • a genetic disorder such as Huntington's disease, cystic fibrosis, sickle cell disease, phenylketonuria, Down syndrome or Marfan syndrome.
  • the population of target nucleic acids may be obtained from a population of cells or an individual cell (e.g. single cell
  • genomics The analysis of nucleic acids from a single cell may, for example, allow genetic variability and epigenetic variability in individual cells and cell-types to be determined.
  • the population of target nucleic acids may obtained from a sample of biological fluid, for example a sample amniotic fluid, cerebrospinal fluid, mucus , sebum, blood, plasma, serum, urine or saliva
  • a sample of biological fluid for example a sample amniotic fluid, cerebrospinal fluid, mucus , sebum, blood, plasma, serum, urine or saliva
  • genomic DNA or RNA Methods of extracting and isolating target nucleic acids, such as genomic DNA or RNA, from an individual cell, a sample of cells or a biological fluid, are well-known in the art.
  • genomic DNA or RNA may be isolated using any convenient isolation
  • a sample of target nucleic acids such as genomic DNA or RNA, may be fragmented to produce target nucleic acid fragments. Fragmentation may reduce the size of the nucleic acids in the population. For example, following fragmentation, the nucleic acids may be lObp to 5000bp, preferably 20bp to 2000bp or 30bp to lOOObp.
  • Suitable fragmentation methods include nebulization, sonication or acoustic shearing, mechanical shearing and endonuclease digestion. The whole or a fraction of the
  • fragmented nucleic acid sample may be used as described herein.
  • Suitable fractions of genomic DNA and/or RNA may be based on size or other criteria.
  • a fraction of genomic DNA and/or RNA fragments which is enriched for CpG islands (CGIs) may be used as described herein.
  • nucleic acid fragments may be repaired to produce a population of blunt-ended nucleic acids.
  • Suitable methods of repairing nucleic acid ends are well-known in the art.
  • fragmented nucleic acids may be converted into blunt-ended molecules by filling in 5' overhangs using a 5' ⁇ 3' polymerase, and removing 3' overhangs using a 3' to 5' exonuclease, in accordance with standard techniques .
  • Suitable polymerases include T4 DNA polymerase and/or Klenow fragment.
  • the ends of the nucleic acids are not treated with a 5' kinase, such as T4 polynucleotide, and remain unphosphorylated at the 5' ends. This may be useful in preventing the ligation of adaptors or other nucleotide sequences to the 5' ends of the nucleic acid strands .
  • the target nucleic acids in the population are blunt-ended and may, in some preferred
  • embodiments comprise free hydroxyls at both the 5' and 3' ends of each strand.
  • Suitable techniques and kits for the end-repair of nucleic acid molecules are widely available from commercial suppliers (e.g. End- ItTM, Epicentre; NEBNextTM end repair Module, New England Biolabs; Fast DNA End Repair, Thermo Fisher Scientific; DNA End Repair Mix, Life Technologies; Paired-End Sample Prep Kit, Illumina Inc) .
  • An adaptor or adaptor oligonucleotide may be ligated directly to the 3' strand of the blunt-ended target nucleic acids in the population as described herein or the blunt-ends may be modified before ligation of the adaptor oligonucleotide.
  • a one-base overhang consisting of an adenine residue (A-tail) may be added to the 3' strands of the blunt ends. This 3' overhang may facilitate ligation of the adaptor oligonucleotide.
  • Other suitable modifications to facilitate the ligation of nucleic acids and oligonucleotides are well-known in the art.
  • the double-stranded target nucleic acids may initially comprise a 5' phosphate group, and a method may comprise removing this 5' phosphate group, for example using a phosphatase, such as antarctic phosphatase or alkaline phosphatase, to produce double-stranded nucleic acids that lack 5' phosphate groups. This may prevent the ligation of oligonucleotides to the 5' ends of the DNA strands.
  • a phosphatase such as antarctic phosphatase or alkaline phosphatase
  • the double-stranded nucleic acids in the population may comprise 3' adenine overhangs and lack 5' phosphate groups following end-repair and/or modification.
  • the double-stranded nucleic acids may compris a 5' phosphate group but ligation of an oligonucleotide to the 5' ends of the strands may be blocked by a blocking group on the oligonucleotide, for example a 3' blocking group (e.g. a group othe than a 3' hydroxyl) or a 3' dideoxynucleotide residue.
  • a blocking group on the oligonucleotide for example a 3' blocking group (e.g. a group othe than a 3' hydroxyl) or a 3' dideoxynucleotide residue.
  • An adaptor is a short nucleic acid at an end of a target nucleic acid that facilitates the sequencing of the target nucleic acid.
  • Adaptors may be located at both ends of a nucleic acid.
  • a nucleic acid may have different adaptors at each end, or more preferably the same adaptor at each end.
  • the adaptors at the ends of a nucleic acid are preferably full-length sequencing adaptors that allow the sequencing of the nucleic acids without the need for the
  • a suitable adaptor for a single-stranded nucleic acid may comprise an adaptor sequence .
  • a suitable adaptor for a double stranded nucleic acid may comprise an adaptor sequence and a complementary sequence which hybridises to all or part of the adaptor sequence, such that the adaptor comprises a double-stranded portion that is ligated to the target nucleic acid and a single-stranded overhang (i.e. a double-stranded region proximal to the target nucleic acid and a single-stranded tail).
  • a nucleic acid in a library may comprise a first adaptor at one end (e.g. a first end) and a second adaptor at the other end (e.g. a second end) .
  • the sequence of the target nucleic acid is located between the first and second adaptors .
  • the nucleic acids in the library may have the same first adaptor at their 3' ends and the same second adaptor at their 5' ends i.e. all of the nucleic acids in the library may be flanked by the same pair of adaptors.
  • the first and second adaptors may be different or more preferably the same (i.e. the nucleic acids may have the same adaptor at each end) .
  • all of the nucleic acids in a library may comprise the same adaptor sequence, or adaptor sequences which differ only in an index sequence.
  • the adaptors and adaptor sequences are synthetic sequences that are not found within the mammalian genome.
  • Adaptors suitable for use in sequencing nucleic acids are well-known in the art. Adaptors are generally specific for a sequencing platform and the sequence of the adaptor therefore depends on the specific sequencing method to be employed. Adaptors suitable for any specific sequencing method are well-known in the art and may be designed and produced using known techniques or obtained from commercial sources . The choice of adaptor nucleotide sequence depends on the sequencing method employed and suitable adaptors . Suitable sequencing platforms include Sanger sequencing, Solexa- Illumina sequencing platforms, such as HiseqTM, MiSeqTM and NextSeqTM, semiconductor array sequencing ( IonTorrentTM; LifeTech) ,
  • pyrosequencing e.g. 454 Sequencing; Roche 454
  • SMRTTM single molecule real-time sequencing
  • adaptors suitable for any of these sequencing platforms may be used in the methods described herein.
  • adaptors may include a region that is complementary to the universal primers on a solid support (e.g. a flowcell or bead) and a region that is complementary to universal sequencing primers (i.e. which when annealed to the adaptor sequence and extended allows the sequence of the nucleic acid molecule to be read) .
  • Adaptor sequences suitable for use as described herein may consist of 20 to 80 nucleotides long.
  • the adaptor may comprise a sequence that hybridises to complementary primers immobilised on the solid support (e.g. 20-30 nucleotides); a sequence that hybridises to a sequencing primer (e.g. 30-40
  • a suitable adaptor may be 56-80
  • Adaptors for Ilumina truseqTM sequencing may be 64 nucleotides long (including 6 nucleotide index) .
  • one or more of the adaptors may have all the cytosine bases in the methylated form. If the adaptors contain unmethylated cytosines, the adaptors are altered during the bisulfite conversion such that any unmethylated cytosines become uracil. Thus any adaptors attached to the sample prior to bisulfite exposure may be free of unmethylated cytosine bases .
  • the adaptor sequence comprises 5 ' methyl-cytosines instead of cytosines, in order to prevent deamination of cytosines in the bisulfite conversion reaction. Preventing the conversion of cytosines in the adaptor sequence to U (read as T) may be useful in ensuring that the adaptor sequence is able to hybridise to the flowcell of the sequencing platform.
  • the 3' adaptor sequence may comprise one or more modified nucleotides or nucleotide analogues that are resistant to bisulfite damage.
  • the 3' adaptor sequence consists of modified nucleotides or nucleotide analogues or
  • the adaptor sequence may comprise or consist of locked nucleic acid (LNA) nucleotides, peptide nucleic acid (PNA) nucleotides, glycol nucleic acid (GNA) , threose nucleic acid (TNA) , morpholino
  • LNA locked nucleic acid
  • PNA peptide nucleic acid
  • GAA glycol nucleic acid
  • TAA threose nucleic acid
  • oligomers such as 2- fluorocytosine, 2 aza-cytosine and 2-O-methylcytosine, 6-substituted nucleotides, sua as 6-fluorocytosine, 6-O-methylcytosine 6-aza- cytosine and/or othe modified nucleotides.
  • 2- substituted nucleotides such as 2- fluorocytosine, 2 aza-cytosine and 2-O-methylcytosine
  • 6-substituted nucleotides sua as 6-fluorocytosine, 6-O-methylcytosine 6-aza- cytosine and/or othe modified nucleotides.
  • Suitable modified nucleotides prevent the bisulfite from forming an adduct with the bases, reducing the propensity for abasic site formation and hence reducing the chance of fragmentation.
  • An adaptor may comprise or consist of nucleotide sequences that are common to the members of the library i.e. each nucleic acid in the library may contain the same adaptor sequences . Libraries produced from different sources may be mixed before sequencing.
  • one or both of the adaptors attached to a target nucleic acid may comprise an individual barcode or index nucleotide sequence that identifies the source of the nucleic acid (e.g. the sample) and allows the multiple samples to be sequenced in a multiplex sequencing reaction. Outside the index, the sequences of the adaptors or adaptor sequences may be the same for all the nucleic acids in the library.
  • the nucleotide sequence of the index allows unambiguous
  • a suitable index for the multiplex sequencing of 24 samples may consist of at least 6 nucleotides, preferably 6 nucleotides (Craig DW et al. 2008. Nat Methods 5, 887; Cronn R et al . 2008. Nucleic Acids Res, 36, el22) .
  • the use of indexes, barcodes or identifiers in sequencing reactions is well-known in the art.
  • the adaptors at the 3' and 5' ends of the nucleic acids in a library produced as described herein from a first sample may have the same "core" sequence as the sequences at the 3' and 5' ends of nucleic acids in a library produced from a second sample except for the index, which is unique to the nucleic acid strands from a particular sample.
  • an index of n/4 bases may differ in the sequences at the 3' and/or 5' ends of nucleic acids from different populations. This maintains the specificity of the adaptor sequences for the sequencing platform (e.g.
  • a sample may be allocated a unique index. Multiple samples may be pooled together in the same sequencing run and sequenced in parallel and then the sequences arising from each individual sample identified from the unique index sequences .
  • the adaptor sequence may be added to the 3' ends of the target nucleic acids by ligating an adaptor
  • the adaptor oligonucleotide may be ligated to the 3' ends by any convenient ligation method.
  • the adaptor oligonucleotide may be ligated to the 3' ends without ligation or other modification of the 5' ends of the nucleic acids.
  • the adaptor oligonucleotide may be linked to binding tag, for example via a cleavable linker. This is described in more detail below.
  • the ligation of the adaptor oligonucleotide is directional
  • oligonucleotide and the 5' ends of the nucleic acids remain unadapted (i.e. no adaptor sequence is ligated to the 5' ends of the nucleic acids) .
  • the adaptor oligonucleotide may be added to the 3' ends of the double-stranded target nucleic acids by any convenient method.
  • the adaptor oligonucleotide may be attached to the 3' ends by suitable ligation methods, including double-stranded ligation, single-stranded ligation, blunt-ended ligation or overhanging ligation .
  • the adaptor oligonucleotide is added to the 3' ends of the double-stranded target nucleic acids as part of an at least partially double-stranded complex or molecule.
  • ligation may be carried out using an enzyme with double-stranded ligase activity in the presence of an inert hybridisation partner that hybridises to the adaptor oligonucleotide but does not ligate to the nucleic acids .
  • the adaptor oligonucleotide may be hybridised to a complementary oligonucleotide to form a ligation complex that allows the adaptor oligonucleotide to be ligated to the double-stranded target nucleic acids using a double-strand specific ligase, such as T4 ligase, without ligation of the complementary oligonucleotide.
  • the complementary oligonucleotide may be complementary to all or part of the adaptor oligonucleotide.
  • the complementary oligonucleotide may be complementary to the 5' end of the adaptor oligonucleotide, such that the ligation complex comprises a double stranded region at the 5' end of the adaptor oligonucleotide and a single stranded overhang at the 3' end of the adaptor
  • the adaptor oligonucleotide of the ligation complex is ligated to the 3' ends of the double-stranded target nucleic acids but the complementary oligonucleotide of the complex is not ligated to the 5' ends of the double-stranded nucleic acids.
  • one or both of the 5' ends of the double-stranded target nucleic acids and the 3' end of the complementary oligonucleotide may be non-ligatable .
  • the 5' ends of the double- stranded nucleic acids may be non-ligatable through the absence of a phosphate group and/or the 3' end of the complementary
  • oligonucleotide may be non-ligatable through the absence of an OH group, for example due to the presence of a blocking group, such as a halogen, or more preferably a dideoxynucleotide .
  • a blocking group such as a halogen, or more preferably a dideoxynucleotide .
  • the complementary oligonucleotide may be 3' substituted or comprise a 3' dideoxynucleotide.
  • oligonucleotide specifically to the 3' ends of the nucleic acids may be facilitated by the modification of the nucleic acid ends .
  • a 3' overhanging adenine (A) residue may be present at the ends of the nucleic acids.
  • the ligation complex may comprise a 3' overhanging T residue which facilitates ligation to nucleic acids in the population comprising a 3' overhanging A residue.
  • the adaptor oligonucleotide may be ligated to the 3' ends of the nucleic acid population by a method comprising; producing a population of nucleic acids lacking 5' phosphate groups and having overhanging 3' A residues,
  • the adaptor sequence may be added to the 3' ends of the target nucleic acids by ligating a double-stranded adaptor to the ends of the target nucleic acids.
  • the double stranded adaptor may comprise the adaptor sequence hybridised to a
  • the adaptor sequence may be ligated to the 3' ends of the double-stranded target nucleic acids and the
  • complementary sequence may be ligated to the 5' ends of the double- stranded target nucleic acids. Any convenient double stranded ligation method may be used to ligate the double stranded adaptor. After ligation and optionally bisulfite treatment, the 5' ends of the nucleic acids may be cleaved to remove the complementary sequence and produce nucleic acids having an adaptor sequence at the 3' end but not at the 5' end.
  • the adaptor sequence of the double-stranded adaptor may be linked to a binding tag, for example via a cleavable linker. This is described in more detail below.
  • the double stranded adaptor comprises a cleavage site at the 3' end of the complementary sequence.
  • the adapted nucleic acids may be cleaved at the cleavage site to remove the complementary sequence ligated to the 5' ends of the nucleic acids, such that adapted nucleic acids have an adaptor sequence at the 3' end, but lack additional sequence at the 5' end.
  • a double-stranded adaptor may comprise more than one cleavage site, for example two or three.
  • the double-stranded adaptor may comprise cleavage site at the 3' end of the complementary sequence and one or more additional cleavage sites, for example within a hairpin sequence or elsewhere.
  • the double stranded adaptor is added to the target nucleic acids before bisulfite treatment and the
  • the double-stranded adaptor is a hairpin adaptor which comprises a hairpin nucleotide sequence that links the adaptor sequence and the complementary sequence i.e. the double-stranded adaptor consists of a polynucleotide chain which forms a double stranded region and a single-stranded hairpin region. This may be useful in protecting the ends of the nucleic acids from damage, for example during bisulfite treatment.
  • the hairpin adaptor comprises a first cleavage site at the 3' end of the complementary sequence and a second cleavage site at the 5' end of the adaptor sequence. Cleavage of the first and second cleavage sites produces a population of nucleic acids having the adaptor sequence at the 3' ends but lacking an adaptor sequence at the 5' ends.
  • Suitable cleavage sites include any site that is specifically cleavable by enzymatic, chemical or other means.
  • Suitable cleavag sites are well known in the art and include modified nucleotides, such as 8-oxoguanine or 8-oxoadenine , which are cleavable by formamidopyrimidine [fapy]-DNA glycosylase (Fpg) and restriction enonuclease recognition sites.
  • the nucleic acids may be treated with bisulfite following addition of the adaptor sequence to the 3' ends.
  • bisulfite may have unmodified 5' ends.
  • a double stranded adaptor such as a hairpin adaptor
  • the nucleic acids treated with bisulfite may have complementary sequences ligated to the 5' ends. These complementary sequences may be removed after the bisulfite treatment to produce bisulfite treated nucleic acids having an adaptor sequence at the 3' end but not the 5' end.
  • Treatment with bisulfite converts unmodified cytosine residues to uracil residues, thereby producing a population of nucleic acid strands comprising uracil residues instead of unmodified cytosines. This may be useful in bisulfite sequencing methods (BS-seq) . Bisulfite treatment may also denature the population of double stranded nucleic acids .
  • a method as described above may comprise treating the population of nucleic acids with bisulfite.
  • a method of preparing a nucleic acid library may comprise ;
  • the adaptor sequence may be added to the 3' ends of the nucleic acids by ligating an adaptor oligonucleotide to the 3' ends but not to the 5' ends of the nucleic acids.
  • the adaptor sequence may be added to the 3' ends of the nucleic acids by ligating a double-stranded adaptor comprising the adaptor sequence hybridised to a complementary sequence to the ends of the nucleic acids, preferably a hairpin adaptor, such that the adaptor sequence is ligated to the 3' ends of the double- stranded nucleic acids and the complementary sequence is ligated to the 5' ends of the double-stranded nucleic acids. After treatment with bisulfite, the 5' ends of the nucleic acids are cleaved to remove the complementary sequence.
  • a method of preparing a nucleic acid library may comprise;
  • the double-stranded adaptor is a hairpin adaptor comprising a single stranded hairpin nucleotide sequence that links the hybridised adaptor and complementary sequences, as described above .
  • the nucleic acids may be treated with
  • the initial population of nucleic acids in step (i) above may be a bisulfite treated population of nucleic acids .
  • the initial population of double-stranded nucleic acids may be provided by a method comprising;
  • the population of double-stranded nucleic acids may then be treated in accordance with steps (i) to (vi) above.
  • the nucleic acids may be subjected to an additional treatment before treatment with
  • bisulfite This may be useful, for example, in performing variants of standard bisulfite sequencing methods (BS-seq) , for example to identify specific cytosine modifications, such as 5hmC, 5fC and 5caC.
  • BS-seq standard bisulfite sequencing methods
  • methods may comprise treating the nucleic acids with ai oxidising agent, and then treating the oxidised nucleic acids with bisulfite.
  • Suitable oxidising agents are well known in the art and include metal oxides, such as KRuO 4, Mn02 and KMn04, and
  • perruthenates such as potassium perruthenate (KRu04) .
  • KRu04 potassium perruthenate
  • Techniques for oxidative bisulfite sequencing are well known in the art (OxBS- seq; Booth et al (2013) Nat Protoc 8 1841-1851; Booth et al (2012) Science 336 934-937) and reagents are available from commercial sources (e.g. Cambridge Epigenetix Ltd. UK) .
  • Methods may comprise treating the nucleic acids with a reducing agent, and then treating the reduced nucleic acids with bisulfite.
  • Suitable reducing agents are well-known in the art and include NaBH 4 , NaCNBH 4 and LiBH 4 .
  • Techniques for reductive bisulfite sequencing are available in the art (redBS-seq; WO2013/017853 ) .
  • Methods may comprise treating the nucleic acids with 3- glu.cosyltran.sfera.se in the presence of UDP-Glucose to add a glucosyl protecting group to 5hmC residues in the nucleic acids; treating the nucleic acids with TET to oxidise 5mC residues in the nucleic acids to 5caC and then treating the TET-oxidised nucleic acids with bisulfite.
  • Techniques for TET-assisted bisulfite sequencing are well-known in the art (TAB-seq; Yu et al (2012) Nat Protoc. 7 (12) 2159-2170; Yu et al Cell (2012) 149(6) : 1368-1380) and reagents are available from commercial sources (e.g, Wisegene LLC USA) .
  • Methods may comprise labelling 5caC residues in the nucleic acids with l-ethyl-3- [3-dimethylaminopropyl] carbodiimide hydrochloride
  • Methods may comprise labelling 5fC residues in the nucleic acids with O-ethylhydroxylamine; and then treating the labelled nucleic acids with bisulfite.
  • Techniques for 5fC chemical modification- assisted bisulfite sequencing are well-known in the art (fCAB-seq; Song et al (2013) Cell 153 1-14) .
  • the strands of nucleic acids in the libraries described herein may comprise nucleotide sequences that are bisulfite-treated (i.e.
  • nucleotide sequences that are the complement of
  • bisulfite-treated sequences i.e. containing adenine instead of unmodified cytosine in the untreated sequence
  • nucleotide sequences that are the complement of the sequences complementary to bisulfite-treated sequences i.e. containing thymine instead of unmodified cytosine in the untreated sequence
  • unmodified cyotosines may be replaced by uracil in nucleic acid strands following bisulfite treatment.
  • the resultant double-stranded molecules comprise a uracil-containing strand and a non-uracil containing complementary strand. Either or both of these strands may be subsequently isolated and sequenced.
  • sequences of bisulfite-treated nucleic acids may be useful in determining the presence or frequency of modified cytosine residues, such as 5mC, in samples of nucleic acid.
  • Bisulfite treatment causes extensive depyrimidination and strand cleavage in populations of nucleic acids. For example, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 95% or more, or 99% or more of the nucleic acids may be cleaved during bisulfite treatment.
  • the nucleic acids in the populations used to produce libraries as described herein may include nucleic acids that are not cleaved by the bisulfite treatment and nucleic acids that are cleaved by the bisulfite treatment.
  • the bisulfite treated population therefore comprises nucleic acid strands of a range of sizes, depending on whether cleavage has occurred and its location relative to the 3' adaptor sequence.
  • a library produced as described herein may comprise nucleic acids ranging from lObp to 5kb, 20bp to 2kb or 30bp to lkb (Ehrich et al Nucleic Acids Research, 2007, Vol. 35, No. 5 e29) .
  • the libraries contain a greater proportion of the initial nucleic acid population than libraries that only comprise uncleaved nucleic acids.
  • the number of nucleic acids in the nucleic acid library may be greater than 0.1%, greater than 1%, greater than 5% or greater than 10% of the number of nucleic acids in the initial population.
  • Libraries produced by the methods described herein may contain sufficient sequenceable nucleic acid molecules to allow sequencing without amplification i.e. the nucleic acids in the library may be sequenced without being amplified.
  • a nucleic acid library is produced as described herein without any amplification of the nucleic acids in the sample.
  • Bisulfite treatment converts unmethylated cytosine residues in a polynucleotide into uracil.
  • the use of bisulfite ions (HS0 3 ⁇ ) to convert unmethylated cytosines in nucleic acids into uracil is standard in the art and suitable reagents and conditions are well known to the skilled person. Numerous suitable protocols and reagents are also commercially available (for example, EpiTectTM, Qiagen NL; EZ DNA MethylationTM Zymo Research Corp CA; CpGenome Turbo Bisulfite Modification Kit; Millipore) .
  • the population of double stranded nucleic acids may be treated with bisulfite by incubation with bisulfite ions (HS0 3 ⁇ ) , for example sodium bisulfite (NaHS0 3 ) .
  • bisulfite ions for example sodium bisulfite (NaHS0 3 ) .
  • Suitable conditions for bisulfite treatment are well known in the art and typically range from 1-16 hours.
  • Bisulfite treatment as described above may denature double-strande 3' adapted nucleic acids to produce nucleic acid strands of a rang of sizes that all have the adaptor sequence at the 3' end.
  • nucleic acid strands in the population are then converted into double stranded DNA molecules through the generation of a complementary strand.
  • nucleic acids may be subjected to an additional denaturation step following bisulfite treatment.
  • double-stranded nucleic acids may be denatured following addition of the 3' adaptor sequence without bisulfite treatment .
  • the nucleic acids may be denatured to disrupt any inter or intra molecular hybridisation. Denaturation converts 3' adapted double- stranded nucleic acids into single nucleic acid strands.
  • the population of double-stranded nucleic acids may be denatured by any convenient method following the addition of the 3' adaptor sequence. For example, the nucleic acids may be denatured by heating or treatment with a chemical denaturant .
  • the complementary strand may be generated by annealing an
  • oligonucleotide primer to the 3' adaptor of the nucleic acid strand and extending the primer in a 5' to 3' direction along the strands to synthesise complementary strands.
  • oligonucleotide primer is preferably complementary to all or part o the 3' adaptor, so that it hybridises under standard hybridisation conditions .
  • complementary strands may be generated for all the nucleic acid strands using the same oligonucleotide primer.
  • the oligonucleotide primer may be linked to a binding tag, for example via a cleavable linker. This is described in more detail below.
  • oligonucleotide primers and primer extension along a single strandi template are well-known in the art and reagents are available from commercial sources .
  • the population comprises double-stranded nucleic acids that have an adaptor at one end (i.e. an adapted first end) comprising the 3' adaptor sequence.
  • the double-stranded nucleic acids in the population have a uracil- containing strand and a non-uracil containing complementary strand.
  • the unadapted (i.e. second) end of the nucleic acids in the population may be repaired and/or adapted to facilitate ligation of the second adaptor.
  • 5' phosphate groups and/or 3' adenine overhangs may be added. Suitable methods for A tailing and/or phosphorylation are well-known in the art.
  • the second ends of the double stranded nucleic acids are adapted through the ligation of the second adaptor.
  • the second adaptor may have the same or a different nucleotide sequence to the adaptor.
  • the second adaptor may be ligated to the second end by any convenient technique.
  • the second adaptor may comprise a 3' T overhang at one or both ends to facilitate ligation to second ends that comprise a 3' A overhang.
  • the second adaptor may be ligated to the second ends by;
  • nucleic acids having an adapted first end and a 3' A overhang and a 5' phosphate group at the second end, as described above,
  • nucleic acids that are sequenceable are isolated from nucleic acids that are non-sequenceable (i.e. molecules not
  • nucleic acids that comprise adaptors at both ends may be isolated, separated or removed from other nucleic acids.
  • nucleic acids that comprise adaptors at both ends.
  • the nucleic acids may be immobilised on a support and other nucleic acids washed away.
  • immobilised nucleic acids may be interrogated directly.
  • the immobilised nucleic acids may be interrogated directly.
  • the immobilised nucleic acids may be interrogated directly.
  • the immobilised nucleic acids may be interrogated directly.
  • the immobilised nucleic acids may be interrogated directly.
  • the immobilised nucleic acids may be interrogated directly.
  • the immobilised nucleic acids may be interrogated directly.
  • the immobilised nucleic acids may be interrogated directly.
  • nucleic acids may be released from the solid support, following washing.
  • nucleic acid strands that comprise the adaptor at the 3' end i.e. sequenceable nucleic acids
  • 3' adapted nucleic acids may then be separated from nucleic acid strands cleaved by the bisulfite treatment and lacking a 3' adaptor.
  • the nucleic acid strands are immobilised before the double stranded nucleic acid is regenerated by primer extension.
  • the population of nucleic acid strands having a 3' adaptor sequence may be immobilised on a solid support following step (iii) .
  • a method of preparing a nucleic acid library may comprise;
  • the nucleic acid strands may be immobilised through a binding tag that is linked to the adaptor sequence for example via a chemical linker.
  • the binding tag may be linked to the adaptor sequence, such that addition of the adaptor sequence, for example by ligation of the adaptor oligonucleotide, links the binding tag to the nucleic acid strands .
  • the population of nucleic acids may be immobilised following primer extension and the generation of the complementary strand.
  • the population of nucleic acids comprising the first adaptor may be immobilised on a solid support following step (v) .
  • the nucleic acids having the first adaptor are immobilised through a binding tag that is linked to the regenerated complementary strand.
  • the binding tag may be covalently linked to the oligonucleotide primer, such that hybridisation of the oligonucleotide primer and subsequent
  • a method of preparing a nucleic acid library may comprise;
  • the nucleic acids may be isolated and/or purified through the binding of a capture member to the binding tag.
  • the capture member and the binding tag may form a specific binding pair.
  • Suitable specific binding pairs may include antibody/immunogenic epitope, such as anti-digoxigenin antibody/digoxigenin; glutathione S-transferase/glutathione; and biotin/biotin binding protein.
  • the binding tag may be an antigen, such as digoxigenin, glutathione, or biotin and the capture member may be an antibody, such as an anti-digoxigenin antibody, glutathione-S-transferase , or a biotin-binding protein, such as streptavidin, avidin, anti-biotin antibody or neutravidin, respectively.
  • the tag is biotin and the capture member is streptavidin.
  • the capture member may be immobilised, for example on a solid support. Binding of the tag to the capture member immobilise as the nucleic acid linked to the tag on the solid support.
  • a solid support is an insoluble body which presents a surface on which the capture member can be immobilised for capture of the labelled nucleic acid.
  • suitable supports include glass slides, microwells, membranes, or microbeads .
  • the support may be in particulate or solid form, including for example a plate, a test tube, bead, a ball, filter, fabric, polymer or a membrane.
  • Nucleic acids may, for example, be fixed to an inert polymer, a 96-well plate, other device, apparatus or material which is used in nucleic acid sequencing or other investigative context.
  • the immobilisation of polynucleotides to the surface of solid supports is well-known in the art.
  • the solid support itself may be immobilised.
  • microbeads may be immobilised on a second solid surface.
  • the solid support may be a magnetic bead.
  • the nucleic acid-binding tag-capture member complex may be washed, for example, to remove non-immobilised molecules from its environment, including unlabelled nucleic acids and other reagents and molecules . Suitable techniques and reagents for washing immobilised complexes are well-known in the art.
  • the nucleic acids may then be released from the solid support using any convenient technique to produce a nucleic acid library.
  • the oligonucleotide primer or oligonucleotide adaptor may be linked to the binding tag through a cleavable linker, for example a linker comprising a chemically sensitive cleavage site. This may faciliate release of the nucleic acids from the solid support.
  • the cleavable linker is attached to a terminus of the backbone of the nucleic acids in the population through a chemical modification, such as a 5' modified benzaldehyde group, and is not attached to a base in the nucleic acid.
  • the linker is chemically cleavable by the disruption of one or more covalent bonds to separate the ends of the probe.
  • the linker comprises a cleavage site which may be chemically cleaved under appropriate conditions.
  • a range of suitable cleavage sites are available in the art, including azide masked hemiaminal ethers, protected hemiaminal ethers, phosphine containing groups, silicon containing groups, disulphides, cyanoethyl groups and photocleavable groups. Examples of suitable cleavage chemistries are shown in Figure 2.
  • the linker may comprise an azide masked hemiaminal ether site. Azide masked hemiaminal ether sites (-OCHN 3 -) may be cleaved by reduction of the azide to an amine, followed by spontaneous hemiaminal ether cleavage ( reaction 1 in Figure 2) .
  • Suitable reducing agents include phosphines (e.g.: TCEP) , thiols (e.g.: DTT, EDT) and metal-ligand complexes, including
  • organometallic Ru-, Ir-, Cr-, Rh- and Co- complexes may include organometallic ruthenium (II)
  • ruthenium (II) polypyridine complexes for example ruthenium (II) polypyridine complexes, tris (bipyridine ) ruthenium ( II ) (Ru(bpy) 3 2+ ) and salts thereof, including Ru(bpy) 3 Cl 2 .
  • Other suitable metal-ligand complexes may include organometallic iridium (II) complexes for example iridium polypyridine complexes, such as Ir (ppy) 2 ( dtb-bpy) +, where ppy is phenylpyridine and dtb-bpy is 4, ' -di-tert-butyl-2 , 2 ' -bipyridine, and salts thereof.
  • the linker may comprise a protected hemiaminal ether site.
  • Protected hemiaminal ether sites may be cleaved by removal of the amine protecting group, followed by spontaneous hemiaminal ether cleavage (reaction 2 in Figure 2) .
  • Suitable protecting groups include allyl or allyl carbamates, which may be cleaved using transition metals with water soluble ligands, e.g. Pd with water soluble phosphine ligands); sulfmoc, which may be cleaved with a mild base, e.g.
  • the linker may comprise a phosphine containing site.
  • Phosphine containing sites for example comprising the structure shown in reaction 3 of Figure 2, may be cleaved by the addition of an azide reagent, for example an alkyl or aryl azide, such as benzyl azide.
  • an azide reagent for example an alkyl or aryl azide, such as benzyl azide.
  • the Staudinger aza-ylid generated reacts intramolecularly with an ester to release the captured DNA.
  • the linker may comprise a silicon containing site.
  • containing sites may be cleaved by vicinal elimination of silicon in the presence of fluoride ions, such as KF and tetra-n-butylammonium fluoride (TBAF) (reaction 4 in Figure 2) .
  • fluoride ions such as KF and tetra-n-butylammonium fluoride (TBAF) (reaction 4 in Figure 2) .
  • the linker may comprise a disulfide site.
  • Disulfide sites may be cleavage by reduction with phosphines, such as TCEP or thiols, such as DTT.
  • the linker may comprise a cyanoethyl site. Cyanoethyl sites may be cleaved under basic conditions, such as NH 3 or 10% K 2 C0 3 .
  • the linker may comprise a photocleavable site.
  • Photocleavable sites may be cleaved by treatment with UV light, preferably of a
  • Suitable photocleavable sites are well known in the art.
  • an orthonitrobenzyl group may be cleaved by UV at 365 nm.
  • a suitable linker may have a total length not exceeding the length of a normal alkyl chain of 2-20 carbons and may comprise from one to about 50 atoms.
  • a suitable linker may have the formula: Ri-Cs-R.2, wherein Cs is the cleavage site and R x and R 2 are
  • R x and R 2 may be selected from the group consisting of substituted or unsubstituted alkyl, substituted or unsubstituted alkenyl, substituted or unsubstituted alkynyl, substituted or unsubstituted cycloalkyl, substituted or
  • the nucleic acids may be released from the solid support by chemical cleavage of the cleavage site in the linker. This may, for example, release the nucleic acid from a nucleic acid-binding tag-capture member complex. Cleavage of the linker separates the nucleic acid from the binding tag which remains bound to the capture member. immobilised
  • step (ix) above may comprise;
  • nucleic acid library comprising nucleic acid strands having a first adaptor at a first end and a second adaptor at a second end.
  • the immobilised nucleic acids may be denatured, for example using a denaturant such as NaOH, in accordance with standard techniques.
  • nucleic acid manipulation including fragmentation, end-repair, ligation, A-tailing, and primer extension as described herein, are known in the art for example, Molecular Cloning: a Laboratory Manual: 3rd edition, Russell et al . , 2001, Cold Spring Harbor Laboratory Press; Protocols in Molecular Biology, Second Edition, Ausubel et al . eds . John Wiley & Sons, 1992) .
  • Methods described herein may comprise interrogating the nucleic acids in the library to identify one or more bases .
  • a method may comprise sequencing one or more,
  • nucleic acids may be sequenced using any convenient low or high throughput sequencing technique or platform, including Sanger sequencing, Solexa-Illumina sequencing, Ligation-based sequencing (SOLiDTM), pyrosequencing; Pacific Biosciences single molecule rea time sequencing (SMRTTM) ; and semiconductor array sequencing (Ion TorrentTM) .
  • Suitable protocols, reagents and apparatus for nucleic acid sequencing are well-known in the art and are available
  • the nucleic acids in the library are sequenced without amplification .
  • a strand of the nucleic acids in the library may contain uracil residues. These nucleic acids may be sequenced using a uracil tolerant polymerase. Nucleic acid strands lacking uracil residues (i.e. the complement of the uracil containing strand) may be prepared and sequenced using standard techniques.
  • the nucleic acids in the nucleic acid library may be interrogated by PCR, hybridisation, for example to an array of immobilised probes or other analysis methods.
  • a method may comprise determining the identity of a residue in nucleic acids in the library at a position that corresponds to cytosine in a non-bisulfite treated nucleic acid. This may allow the identification of modified cytosine residues, such as
  • nucleotide sequence may be determined. For example, the proportion or amount of 5-methylcytosine at a position in a nucleotide sequence compared to unmodified cytosine may be determined in a sample.
  • kits for use in the preparation of a nucleic acid library for example using a method described above, comprising; an adaptor oligonucleotide,
  • kits for use in the preparation of a nucleic acid library as described herein may comprise;
  • a double stranded adaptor comprising an adaptor sequence and a complementary sequence
  • Suitable adaptors, oligonucleotides, primers and adaptors are described in detail above.
  • the complementary oligonucleotide and the oligonucleotide primer are hybridisable to the adaptor oligonucleotide and are preferably complementary to all or part of the adaptor oligonucleotide.
  • the adaptor oligonucleotide may comprise or consist of nucleotide analogues, such as PNA or LNA, or modified nucleotides, such as 2' substituted nucleotides, and may be
  • the complementary oligonucleotide may lack a 3'
  • the complementary oligonucleotide may comprise a 3' blocking group, such as a 3' halogen group, or may comprise a 3' dideoxynucleotide .
  • One of the adaptor oligonucleotide and the oligonucleotide primer may be linked to a binding tag via a cleavable linker. Cleavable linkers and binding tags are discussed above.
  • the kit may further comprise a bisulfite reagent (HS0 3 ) , as
  • the kit may further comprise nucleic acid isolation reagents.
  • Suitable reagents are well-known in the art and include spin- chromatography columns.
  • the kit may further comprise end-repair reagents, for example reagents to produce blunt ended nucleic acid fragments.
  • Suitable reagents are well-known in the art and may include a 5' ⁇ 3'
  • the end-repair reagents may comprise T4 DNA polymerase and Klenow fragment.
  • the end-repair reagents produce blunt ended nucleic acid fragments lacking 5' phosphate groups and do not include a 5' kinase, such as T4 kinase.
  • the kit may further comprise a cleavage agent which cleaves the adaptor at the cleavage sites. Suitable cleavage agents are described in more detail above.
  • the kit may further comprise end-modification reagents, for example reagents for the addition of a 3' A tail, such as dATP and Taq DNA Polymerase .
  • end-modification reagents for example reagents for the addition of a 3' A tail, such as dATP and Taq DNA Polymerase .
  • the kit may further comprise one or more reagents for performing a variant bisulfite sequencing method.
  • a kit may comprise an oxidising agent, such as a metal oxide, such as KRu0 4 , Mn0 2 and KMn0 4 , or a perruthenate, such as potassium perruthenate (KRu0 4 ) , for oxidative bisulfite sequencing.
  • a kit may comprise a reducing agent, such as NaBH 4 , NaCNBH 4 or LiBH 4 , for reductive bisulfite sequencing.
  • a kit may comprise a ⁇ -glucosyltransferase , UDP-Glucose and a TET enzyme for TET-ass.ist.ed bisulfite sequencing.
  • a kit may comprise 1- ethyl-3- [3-dimethylaminopropyl] carbodiimide hydrochloride (EDC) or O-ethylhydroxylamine for chemical modification-assisted bisulfite sequencing and 5fC chemical modification-assisted bisulfite
  • EDC carbodiimide hydrochloride
  • O-ethylhydroxylamine for chemical modification-assisted bisulfite sequencing and 5fC chemical modification-assisted bisulfite
  • kits may include one or more other reagents required for the method, such as buffer solutions, sequencing and other reagents
  • the kit may further comprise a labelling buffer for attachment of the capture member to nucleic acid containing the binding tag.
  • the kit may further comprise a release buffer for cleavage of a cleavable linker which is attached to nucleic acid.
  • Suitable release buffers depend on the cleavage chemistry involved and may comprise a reducing agent, for example a thiol, phosphine or metal-ligand complex reducing agent, as described above.
  • the kit may further comprise a capture member.
  • the capture member may bind specifically to the binding tag of the oligonucleotide adaptor or primer in the kit.
  • the kit may further comprise a solid support.
  • the solid support may be coated or coatable with the capture member. Suitable solid supports are described above and include magnetic beads.
  • the binding tag is biotin and the solid support is streptavidin-coated magnetic beads.
  • a magnet may be included in the kit for purification of the magnetic beads.
  • the kit may further comprise sequencing reagents.
  • the kit may comprise a uracil-tolerant polymerase.
  • the kit may further comprise one or more oligonucleotides or nucleic acids for use as controls.
  • oligonucleotide or nucleic acid may comprise at least one modified cytosine residue.
  • a suitable negative control oligonucleotide or nucleic acid may be devoid of modified cytosines.
  • oligonucleotides may be made synthetically by standard methods .
  • the kit may comprise a DNA strand for
  • a kit for use in preparation of a nucleic acid library may include one or more articles and/or reagents for performance of the method, such as means for providing the test sample itself, including DNA and/or RNA isolation and purification reagents, sample handling containers (such components generally being sterile), and other reagents required for the method, such as buffer solutions,
  • the kit may include instructions for use in a method of preparation of a nucleic acid library as described above.
  • Another aspect of the invention provides the use of a kit as set out above in the preparation of a nucleic acid library, for example using a method described above.
  • FIG. 1 shows quantification of the full-length target DNA
  • Figure 2 shows examples of cleavage chemistries which may be used in a cleavable linker in the library preparation protocols described herein .
  • Figure 3 shows a library preparation protocol according to an embodiment of the invention.
  • Figure 4 shows a library preparation protocol according to another embodiment of the invention.
  • Figure 5 shows a library preparation protocol according to another embodiment of the invention.
  • Figure 6 shows a library preparation protocol according to another embodiment of the invention.
  • Figure 7 shows qPCR of control after BS treatment and sample DNA using the modified protocol described above.
  • Figure 8 shows an Agilent Tapestation electropherogram of the DNA fragment distribution of the sample mixture, analysed after qPCR.
  • Figure 9 shows a first set of results of a sequencing run on 150 bp paired ends using an Illumina Miseq instrument and 500ng of E. coli genomic DNA.
  • Upper track shows DNA prepared by a new PCR free BS treatment process according to an embodiment of the invention.
  • Lower track shows DNA prepared using a standard process comprising BS treatment and 15 cycles of PCR.
  • Figure 10 shows a second set of results of a sequencing run on 150 bp paired ends using an Illumina Miseq instrument and 500ng of E. coli genomic DNA.
  • Upper track shows DNA prepared by a new PCR free BS treatment process according to an embodiment of the invention.
  • Lower track shows DNA prepared using a standard process comprising BS treatment and 15 cycles of PCR.
  • Figure 11 shows the genomic coverage of a BS sequencing preparation described herein relative to standard BS sequencing preparation as a log 2 ratio (old method/new method) ) . Coverage was summed in 100 base pair windows along the genome, and normalized to correct for differences in mapped reads. Output values are negative where the old method has less reads and positive where the new method has less reads. A noise threshold was set at +/- 1.5, beyond which a window was defined a A gap' (points), ie : has significantly fewer reads than it should.
  • Figure 12A shows Tapestation images using standard sensitivity of ladder (left) , illumina adapter ligated product (middle) and hairpin ligated adapter product (right) .
  • Figure 12B shows Tapestation images using high sensitivity of ladder (left), bisulfite treated illumina adapted DNA (middle) and
  • Figure 14 The effect of GC content on depth of coverage
  • (a) A genome browser view showing the coverage obtained across the P. berghei apicoplast for both methods. While near constant for REBUiLT, there are distinct read pile-ups in PCR-BS that appear to track GC content
  • FIG. 16 Duplication Rates. Duplicate reads obtained using read one only. The dashed horizontal line indicates the expected duplication rate given the read number and genome size. The ReBuilT libraries show a small increase over the expected value, while the PCR-BS libraries show over double the expected duplication rate. The observed duplication rate includes PCR duplicates, but is also affected by uneven coverage. Local increases in coverage will increase the observed duplication rate over the expected.
  • a method of preparing a nucleic acid library comprising;
  • a method according to any one of the preceding embodiments comprising treating the nucleic acids with bisulfite such that unmodified cytosine residues in said molecules are converted to uracil .
  • a method according to any one of embodiments 1.6 to 1.8 comprising determining the identity of a base in one or more nuclei acids in the library at a position that corresponds to cytosine in the non-bisulfite treated nucleic acids .
  • nucleic acids are labelled with O-ethylhydroxylamine before said treatment with bisulfite.
  • a method according to any one of the preceding embodiments comprising isolating double-stranded nucleic acids having an adaptor at a first end and a second adaptor at a second end.
  • a method according to any one of embodiments 1.1 to 1.16 comprising immobilising the population of nucleic acids on a solid support following production of the complementary strands.
  • oligonucleotide primer is linked to a binding tag and the method comprises binding the tag to a capture member immobilised on a solid support thereby immobilising the population of nucleic acids.
  • nucleic acid library comprising nucleic acid strands having a first adaptor sequence at a first end and a second adaptor sequence at a second end.
  • nucleic acids are DNA molecules .
  • genomic DNA molecules are genomic DNA molecules .
  • nucleic acids from one of: a cell, a sample of cells, and a biological fluid sample,
  • a method according to embodiment 1.31 comprising modifying the 3' ends of the nucleic acids.
  • 1.33 A method according to embodiment 1.32 wherein the 3' ends are modified by the addition of an overhanging adenine residue.
  • 1.34 A method according to any one of the preceding embodiments wherein the double-stranded nucleic acids in the population comprise a one base 3' overhang consisting of an adenine residue.
  • 1.38 A method according to embodiment 1.36 or embodiment 1.37 comprising ligating the complex to the population such that the adaptor oligonucleotide of the complex is covalently linked to the 3' ends of the double-stranded nucleic acids and the complementary oligonucleotide of the complex is not linked to the 5' ends of the double-stranded nucleic acids.
  • 1.39 A method according to embodiment 1.38 wherein the double- stranded nucleic acids in the population lack 5' phosphate groups.
  • the adaptor sequence is ligated to the 3' ends of the double-stranded nucleic acids and the complementary sequence is ligated to the 5' ends of the double-stranded nucleic acids, and cleaving the 5' ends of the nucleic acids to remove the complementary sequence.
  • said hairpin adaptor comprising a hairpin sequence that links the adaptor sequence and the complementary sequence.
  • hairpin adaptor comprises a first cleavage site at the 3' end of the complementary sequence and a second cleavage site at the 5' end of the adaptor sequence .
  • nucleic acids are treated with bisulfite after ligation of the double-stranded adaptor and before cleavage of the 5' ends.
  • 1.47 A method according to any one of the preceding embodiments comprising modifying the second ends of the double stranded nucleic acids generated by extension of the oligonucleotide primer. 1.48 A method according to embodiment 1.47 comprising adding a 5' phosphate group and a 3' adenine residue to the second end of the double-stranded nucleic acids .
  • a method according to embodiment 1.48 comprising;
  • a kit for use in the preparation of a nucleic acid library according to any one of embodiments 1.1 to 1.49 comprising;
  • a hairpin adaptor comprising an adaptor sequence and a a complementary sequence
  • oligonucleotide consists of nucleotide analogues or modifie nucleotides .
  • kit according to any one of embodiments 1.50 to 1.52 further comprising a bisulfite reagent
  • kit according to any one of embodiments 1.50 to 1.53 further comprising one or more nucleic acid isolation reagents.
  • kits according to any one of embodiments 1.50 to 1.54 further comprising one or more end-repair reagents.
  • 1.56 A kit according to embodiment 1.55 wherein the end-repair reagents do not include a 5' kinase.
  • kits according to any one of embodiments 1.50 to 1.56 further comprising one or more end-modification reagents.
  • kits according to any one of embodiments 1.50 to 1.57 further comprising one or more cleavage reagents for cleavage of the hairpin primer .
  • kits according to embodiment 1.58 wherein the cleavage reagents comprise formamidopyrimidine [fapy]-DNA glycosylase.
  • a kit according to any one of embodiments 1.50 to 1.59 further comprising a solid support.
  • kits according to any one of embodiments 1.50 to 1.60 further comprising one or more sequencing reagents.
  • the adaptor sequence is ligated to the 3' ends of the double-stranded nucleic acids and the complementary sequence is ligated to the 5' ends of the double-stranded nucleic acids, and cleaving the nucleic acids to remove the complementary sequence .
  • a method according to any one of embodiments 1.63 to 1.70 comprising isolating the double-stranded nucleic acids having an adaptor at a first end and a second adaptor at a second end.
  • the recovery after bisulfite treatment (ReBuilT) method begins with fragmentation, end repair and A-tailing.
  • ddT dideoxythymidine
  • the presence of a 3' ddT prevents ligation to the 5' end of the insert DNA, resulting in a single stranded directional ligation to the 3' insert terminus
  • a primer extension step with a high fidelity uracil tolerant polymerase is performed to generate blunt ended double stranded DNA, which is immobilized on streptavidin coated magnetic beads via the biotin label.
  • the immobilized DNA is end repaired and A-tailed before ligation of a fully complementary adapter.
  • To generate sequenceable fragments we copy the bisulfite- converted strands by single primer extension. These new strands contain only the canonical DNA bases (A, T, G and C) , which is necessary as standard next-generation sequencing platforms are incompatible with uracil containing DNA.
  • A, T, G and C canonical DNA bases
  • the first directional ligation prevents the formation of adapter dimers, a common sequencing contaminant that lead to non-insert sequencing reads .
  • Adapter dimers forming during the second ligation have no impact on library composition, as they are completely removed during washing of the beads.
  • the immobilization on beads enables near lossless library manipulation.
  • the ReBuilT libraries retain approximately double the percentage of raw data for methylation calling when compared to the PCR-BS libraries.
  • the ReBuilT method therefore, yields considerably more useable data, which reduces the sequencing power required for methylation analysis.
  • duplication rate ( Figure 16) .
  • the ReBuilT libraries were found to have an average duplication rate of 16%.
  • PCR-BS sample has almost double the duplication rate of 30%, which is a cumulative effect of amplification duplicates and extremely uneven coverage. Uneven coverage leads to peaks and troughs in read depth, which will locally raise or lower the expected duplication rate.
  • PCR-BS samples exhibit a strong preference for the relatively GC rich windows .
  • the preference for a balanced base composition has the potential to introduce two types of artifacts when analyzing the methylome. Firstly, the quantification of methylation levels will be affected. As 5mC bases are not converted to thymine during bisulfite treatment, DNA fragments containing methylated loci will tend to have a higher GC content. As GC content can clearly affect amplification efficiency, it is no longer correct to determine the methylation level at a site with the (C/C+T) formula. Secondly, certain biological features display
  • Apicomplexan parasites of which Plasmodium is one, have a non- photosynthetic relict plastid called the apicoplast that codes for proteins that participate in lipid biosynthesis and iron metabolism.
  • This organelle contains multiple copies of a 35 kb genome, and it has been suggested is unmethylated (Ponts, N. et al . Genome-wide mapping of DNA methylation in the human malaria parasite Plasmodium falciparum. Cell Host Microbe 14, 696-706 (2013) ) . We determined the average number of genome copies to be 5.5, and detected significant methylation along its sequence.
  • the other sample was used to recover fragmented DNA that still contained the 3' biotinylated primer site by doing a primer extension step that resulted in 5' blunt end double stranded DNA. After end repair and A-tailing, the 5' adaptor was ligated onto the DNA fragments. Starting from the primer extension step, all the steps were done on streptavidin coated magnetic beads facilitating the purification steps and thus minimizing loss of DNA. After the last wash step, the beads were suspended in 15uL water similarly to the control sample. qPCR was performed on luL of a 200 fold diluted sample ( Figure 7) and post qPCR DNA was run on a Tapestation ( Figure 8 ) to evaluate the fragment distribution.
  • the Ct value ( Figure 7) can be used as a measurement of the quantity of amplifiable fragments in the sample, i.e., fragments that contain intact primer sites.
  • the control bisulfite sample has a
  • fragments (100 bp), which is what would be seen for a standard bisulfite sample, a range of fragments can be seen below 100 bp ( Figure 6) . These are generated by the repair of fragments that would otherwise not be amplifiable.
  • 500ng of human genomic DNA was prepared and BS treated using the PCR free method described above. 35 bp paired ends were sequenced on an Illumina Miseq instrument.
  • E. coli genomic DNA 500 ng was prepared and BS treated i) using the PCR free method described above and ii) using a standard BS preparation with 15 cycles PCR to produce indexed libraries .
  • the two libraries of 50 bp paired ends were sequenced in one run using an Illumina Miseq instrument.
  • the small E coli genome was chosen to exemplify the drastic changes in sequence coverage that are induced by PCR amplification . Changes in coverage across the genome are important because biases affect the quantitative power of BS-sequencing . BS quantitation is
  • Figures 9 and 10 illustrate the advantages of PCR free librar preparation as described herein in improving genomic coverage compared to standard BS treatment methods .
  • the boxed sequence in Figure 9 has a highly skewed base composition
  • This bisulfite-converted region is classified as hugely AT-rich, du to having 91% AT composition.
  • the average coverage in this AT-rich region was over lOx higher in the PCR free sample.
  • AT-rich regions are known to be poorly amplified by PCR; the greater the AT skew, the poorer the observed amplification and hence the fewer reads in the PCR amplified library.
  • the boxed post-bisulfite sequence in figure 10 has a highly skewed base composition of 88% AT.
  • a 20 base pair region in the centre of the box has zero aligned reads in the traditional BS prep; however, there is little change from the average coverage in the new recovery protocol .
  • SEQ ID NO: 2 Adaptor for Illumina sequencing (Truseq ) with index sequence underlined.
  • genomic DNA was digested using DNA degradase (Zymo research) according to manufacturer's instructions, with stable isotope labelled nucleotides (dC + 3, m 5 C + 3, hm 5 dC + 3 and N 6 m dA + 3) spiked in at 25 nM final concentration.
  • a dilution series (0.0125 - 15000 nM) of the unlabelled reference standards (dC, m 5 C, m 5 hmC and N 6 mA; Sigma Aldrich, Carbosynth Ltd) were mixed with the stable isotope labelled nucleosides.
  • Quantitative LC-MS/MS analysis was carried out using an Agilent 1290 Infinity UHPLC coupled to a Thermo Q-exactive mass spectrometer.
  • LC was performed on a Waters Acquity UPLC HSS T3 column (100 x 2.1 mm, 1.8 ⁇ particle size) kept at 50°C, applying a gradient starting at 100% of 0.1% formic acid in water followed by increasing proportions of 0.1% formic acid in acetonitrile up to 30%, at a flow rate of 350 L/min over 3 minutes.
  • the MS was operated in positive ion mode.
  • Adapter pair 2 ODN2a + ODN2b
  • ODNla was obtained by employing terminal deoxynucleotidyl
  • oligomer ODNlb was purified by ethanol precipitation, and resuspended in 10 mM Tris, 50 mM NaCl.
  • the underlined six-nucleotide portion of oligomer ODNlb was varied to give different adapter barcodes. All cytosines in ODNlb were replaced with 5mC to retain the adapter sequence following bisulfite conversion.
  • Adapter pairs were annealed in a thermocycler (95 °C for 10 minutes, cooling to 70 °C over 10 minutes, holding at 70 °C for 10 minutes and then slowly cooling to RT at 0.1 °C s _1 ) to give 25 ⁇ solutions in 10 mM Tris-HCl pH 7. , 50 mM NaCl.
  • Annealing ODNla and ODNlb generated adapter pair 1; annealing ODN2a + ODN2b generated adapter pair 2.
  • the reaction mixture was incubated with 60 g of streptavidin coated magnetic beads (Magnasphere Paramagnetic Particles, Promega) in 2x binding buffer (10 mM Tris-HCl pH 7.4, 1 mM EDTA, 2 M NaCl, 0.1% Tween 20) for 20 minutes at room temperature. Beads were washed three times with 400 binding buffer before being end repaired. Beads were again washed three times with 400 binding buffer before dA-tailing, and a further three times with 400 binding buffer before ligation of adapter pair B. Finally, three washes with 400 binding buffer were followed by elution of the A,T,G,C strand with 50 mM NaOH at 60 °C for 15 minutes.
  • 2x binding buffer 10 mM Tris-HCl pH 7.4, 1 mM EDTA, 2 M NaCl, 0.1% Tween 20
  • mapping quality of reads mapped with more then 10% of mismatches was reset to 0 using resetHighMismatchReads .
  • py code . google . com/p/bioinformatics- misc/source/browse .
  • Overlapping read pairs were clipped using clipOverlap in the BamUtil suite version 1.0.12. Genomic data manipulations were facilitated by samtools, BEDTools, Picard
  • Runs of methylated cytosines were detected by segmenting the signal of combined p-values .
  • the vector of combined p-values was first converted to a vector of discrete observations as follows: ⁇ 0' if p > 0.1, ⁇ 1' if 0.1 ⁇ p ⁇ 0.05, ⁇ 2' if 0.05 ⁇ p ⁇ 0.001 and ⁇ 3' if p ⁇ 0.001.
  • HMM two state hidden Markov model
  • the R package RHmm was used for model fitti (Taramasco, O. & Bauer, S. RHmm: Hidden Markov Models simulati and estimations. (2013 ) ) ⁇

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Immunology (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne la préparation d'acides nucléiques, par exemple, d'acides nucléiques traités au bisulfite, pour l'analyse de marques portées par des cytosines modifiées. L'invention concerne également un procédé de préparation d'une bibliothèque d'acides nucléiques traités au bisulfite comprenant une procédure de ligature en deux étapes, où un premier adaptateur est ajouté avant un traitement au bisulfite et un second adaptateur est ajouté par la suite.
EP15713006.3A 2014-03-24 2015-03-24 Procédé de préparation d'acides nucléiques Withdrawn EP3122879A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB1405226.0A GB201405226D0 (en) 2014-03-24 2014-03-24 Nucleic acid preparation method
PCT/GB2015/050871 WO2015145133A1 (fr) 2014-03-24 2015-03-24 Procédé de préparation d'acides nucléiques

Publications (1)

Publication Number Publication Date
EP3122879A1 true EP3122879A1 (fr) 2017-02-01

Family

ID=50686796

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15713006.3A Withdrawn EP3122879A1 (fr) 2014-03-24 2015-03-24 Procédé de préparation d'acides nucléiques

Country Status (3)

Country Link
EP (1) EP3122879A1 (fr)
GB (1) GB201405226D0 (fr)
WO (1) WO2015145133A1 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9115386B2 (en) 2008-09-26 2015-08-25 Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
PL2737085T3 (pl) 2011-07-29 2017-06-30 Cambridge Epigenetix Limited Sposoby wykrywania modyfikacji nukleotydów
EP3351644B1 (fr) 2012-11-30 2020-01-29 Cambridge Epigenetix Limited Agent oxydant pour nucléotides modifiés
GB201403216D0 (en) 2014-02-24 2014-04-09 Cambridge Epigenetix Ltd Nucleic acid sample preparation
JP6743268B2 (ja) 2016-03-25 2020-08-19 カリウス・インコーポレイテッド 合成核酸スパイクイン
ES2882329T3 (es) * 2016-04-07 2021-12-01 Univ Leland Stanford Junior Diagnóstico no invasivo por secuenciación de ADN fuera de las células 5-hidroximetilado
AU2019351130A1 (en) 2018-09-27 2021-04-08 Grail, Llc Methylation markers and targeted methylation probe panel
CA3118990A1 (fr) * 2018-11-21 2020-05-28 Karius, Inc. Procedes, systemes et compositions de bibliotheque directe
WO2021097252A1 (fr) * 2019-11-13 2021-05-20 Bradley Bernstein Dosage de méthylation et leurs utilisations

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009132315A1 (fr) * 2008-04-24 2009-10-29 Life Technologies Corporation Procédé de séquençage et d'élaboration de la carte d'acides nucléiques cibles

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2015145133A1 *

Also Published As

Publication number Publication date
WO2015145133A1 (fr) 2015-10-01
GB201405226D0 (en) 2014-05-07

Similar Documents

Publication Publication Date Title
WO2015145133A1 (fr) Procédé de préparation d'acides nucléiques
DK2737085T3 (en) METHODS FOR DETECTING NUCLEOTID MODIFICATION
EP2619329B1 (fr) Capture directe, amplification et séquençage d'adn cible à l'aide d'amorces immobilisées
US11384383B2 (en) In vitro isolation and enrichment of nucleic acids using site-specific nucleases
US11274335B2 (en) Methods for the epigenetic analysis of DNA, particularly cell-free DNA
WO2018195217A1 (fr) Compositions et procédés pour la construction de bibliothèques et l'analyse de séquences
US20190048406A1 (en) Oxidising Agent for Modified Nucleotides
US11608518B2 (en) Methods for analyzing nucleic acids
CN114901818A (zh) 靶向核酸文库形成的方法
Tost Current and emerging technologies for the analysis of the genome-wide and locus-specific DNA methylation patterns
WO2016034908A1 (fr) Procédés de détection d'une modification nucléotidique
EP3022321A2 (fr) Analyse miroir faisant appel au bisulfite
Tost Current and emerging technologies for the analysis of the genome-wide and locus-specific DNA methylation patterns
US20220307077A1 (en) Conservative concurrent evaluation of dna modifications
WO2023159250A1 (fr) Systèmes et procédés de capture ciblée d'acide nucléique et de codage à barres

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20161019

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20171120