WO2016016639A1 - Improved nucleic acid sample analysis using convertible tags - Google Patents

Improved nucleic acid sample analysis using convertible tags Download PDF

Info

Publication number
WO2016016639A1
WO2016016639A1 PCT/GB2015/052183 GB2015052183W WO2016016639A1 WO 2016016639 A1 WO2016016639 A1 WO 2016016639A1 GB 2015052183 W GB2015052183 W GB 2015052183W WO 2016016639 A1 WO2016016639 A1 WO 2016016639A1
Authority
WO
WIPO (PCT)
Prior art keywords
cytosine
different
nucleic acid
adaptor
tag
Prior art date
Application number
PCT/GB2015/052183
Other languages
French (fr)
Inventor
Tobias William Barr Ost
Russell Smith HAMILTON
Original Assignee
Cambridge Epigenetix Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambridge Epigenetix Ltd filed Critical Cambridge Epigenetix Ltd
Priority to EP15744329.2A priority Critical patent/EP3174996A1/en
Publication of WO2016016639A1 publication Critical patent/WO2016016639A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers

Definitions

  • This invention relates to the preparation of nucleic acid samples for analysis.
  • Single stranded sample preparation is commonly used following bisulfite conversion of DNA molecules.
  • the bisulfite conversion process necessarily results in the formation of single stranded DNA, and therefore involves either i) pre-bisulfite sample preparation or ii) post- bisulfite sample preparation employing random priming for downstream analysis.
  • Drawbacks to these methods include the potential to generate nicked or fragmented libraries incapable of subsequent amplification, the loss of sequence information from the parent DNA molecules, generation of artefacts that contaminate the sample of interest or induce significant representation bias of reads in the final dataset.
  • Bisulfite sequencing allows 5-methylcytosine to be distinguished from the unmethylated cytosine.
  • other cytosine modifications including 5- hydroxymethyl and 5-formyl have been identified.
  • techniques involving oxidation and/or reduction of the samples prior to bisulfite sequencing have been developed.
  • the sequencing output must be compared with a sample which has not undergone bisulfite treatment. Both 5-formylcytosine (5fC) and cytosine (C) are converted to uracil upon bisulfite treatment.
  • the inventors have developed a set of molecular tag adaptors allowing the history of a sample to be followed.
  • the adaptors have a region having more than one type of cytosine base.
  • the use of standard adaptors without different cytosine bases does not allow the history of the sample to be followed as the adaptor sequences are not traceable.
  • the tag indexes of the invention on treatment with BS, oxBS or redBS become unique post conversion markers (convertible tags) and show what has happened to the sample.
  • the use of multiple tag index sequences allows a plurality of different samples to be analysed in parallel.
  • the use a single tag adaptor allows a single library sample to be split into sub aliquots and each sub aliquot processed through a different conversion chemistry.
  • the different conversion chemistry causes different conversions to happen to the different cytosine bases within the tag sequence.
  • the profile of each molecule in the sequencing run can be determined from the resultant tag sequence, which changes depending on the history of chemical exposure.
  • the separately processed samples can thus be pooled together for sequence analysis. Each treated sample can be unambiguously resolved from the pool, and demultiplexed into separate bins determined by sample and conversion type.
  • the benefits of the tags include reducing the cost of library construction and sequencing. A single library serves all conversion chemistries, there is no need for any bisulfite specific sample preparation steps. Further advantages include reducing the sources of technical errors and variability induced by the vagaries of library construction. All converted samples share the same, identical starting library, eliminating any library to library construction differences.
  • the sequence is a 10-mer with 4 cytosine bases, one of each type (C, mC, hmC and fC).
  • C cytosine base
  • mC cytosine base
  • hmC hmC
  • fC cytosine base
  • the disclosure includes a nucleic acid adaptor having a tag sequence having at least two different cytosine bases, including one or more modified cytosine bases.
  • the tag sequence has a first cytosine base selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5- hydroxymethylcytosine and a second cytosine base has a different nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine.
  • the term different in this context means chemically or structurally different, not just in an alternative location or of alternative sequence.
  • the disclosure includes a nucleic acid adaptor having a tag sequence having at least three chemically different cytosine bases, including two or more modified cytosine bases.
  • the tag sequence has a first cytosine base which is unmodified cytosine, a second cytosine base which is 5-methylcytosine, and a third cytosine base which has a nucleotide selected from 5- formylcytosine or 5-hydroxymethylcytosine.
  • the term different in this context means chemically or structurally different, not just in an alternative location or of alternative sequence.
  • the adaptor may have at least 4 cytosine bases, including 4 chemically different cytosine bases.
  • the four different cytosine bases may be a cytosine base, a 5-methylcytosine base, a 5- formylcytosine base and a 5-hydroxymethylcytosine base.
  • the tag sequence on the adaptor may be 5-20 bases in length.
  • the tag sequence on the adaptor may be 6-12 bases in length.
  • the tag sequence may be 10 bases in length.
  • the design of the tag may be such that the different cytosine bases are not adjacent in the sequence.
  • the tag may be of type C1XC2XC3XC4X where Cl-4 are the different cytosines, and X is one or more nucleotides selected from T, A or G.
  • X can be such that the sequence contain a purine (A or G).
  • the adaptor can be constructed such that each of the cytosine bases are separated by at least one purine.
  • Sequences for the tag sequences may be selected from one or more of the sequences listed below:
  • each C is either a cytosine base, a 5-methylcytosine base, a 5- formylcytosine base and a 5-hydroxymethylcytosine base such that each sequence has one of each different type of modification.
  • the modifications can be in any order such that any of the C bases can be any modification, providing all four modifications are present in any 10- mer, and no sequence has two identical C bases.
  • the tag sequence may be part of a larger oligonucleotide.
  • the tagged adaptor may also have a further region for hybridising a primer.
  • the larger oligonucleotide adaptor may be part of a single stranded or double stranded adaptor attached to the end of the nucleic acid fragments to be analysed.
  • the adaptor sequences may contain methylated C bases instead of C bases. Conventional C nucleotides become uracil bases upon bisulfite treatment, and it may be advantageous to avoid transforming the adaptors with bisulfite.
  • the adaptors may therefore contain G, A, T and methyl C bases, with the tag regions only having C, formyl C or hydroxymethyl C bases.
  • Examples of types of adaptors carrying tag sequences (shown as 9 or 10-mers in the examples, not to be limited to 9 or 10-mers in reality), are shown in the example below. As can be seen, the adaptors having the tags are larger than just the tags.
  • the term tagged adaptors refers to the adaptors having the tag sequences included therein.
  • Each cytosine bases is shown as '5' in the sequences above, indicating 5-methylcytosine.
  • the adaptors are 'forked adaptors' having a region of double stranded sequence and two regions of single stranded sequence.
  • Each adaptor contains a tag in one single stranded region of the 'fork' .
  • each adaptor in the example above contains a first strand where every C base is 5-methyl C, and a second strand where every C base except the C bases in the tag is 5-methyl C.
  • the term adaptor can apply to either the single stranded oligonucleotides or the hybridised pair of oligonucleotides.
  • the invention also includes a nucleic acid sample labelled with a nucleic acid adaptor according as herein described. The sample may be fragmented prior to attachment of the tagged adaptor.
  • kits comprising multiple different sequences where each adaptor has two or more different cytosines.
  • kits comprising a first nucleic acid adaptor having a tag sequence of 5-20 bases including at least two different cytosine bases in the tag and a second nucleic acid adaptor having a different tag sequence of 5-20 bases including at least two different cytosine bases in the tag.
  • the kit may have more than 2 adaptors.
  • the kit may have adaptors with at least 4 different tag sequences, each having at least two different cytosine bases in the tag.
  • the kit may have adaptors with at least 10 different tag sequences, each having at least two different cytosine bases in the tag.
  • the kit may have adaptors with at least 24 different tag sequences, each having at least two different cytosine bases in the tag.
  • the sequences of the tags may be selected from the sequences shown above.
  • kits comprising multiple different sequences where each adaptor has three or more different cytosines.
  • kits comprising a first nucleic acid adaptor having a tag sequence of 5-20 bases including at least three different cytosine bases in the tag and a second nucleic acid adaptor having a different tag sequence of 5-20 bases including at least three different cytosine bases in the tag.
  • the kit may have more than 2 adaptors.
  • the kit may have adaptors with at least 4 different tag sequences, each having at least three different cytosine bases in the tag.
  • the kit may have adaptors with at least 10 different tag sequences, each having at least three different cytosine bases in the tag.
  • the kit may have adaptors with at least 24 different tag sequences, each having at least three different cytosine bases in the tag.
  • the sequences of the tags may be selected from the sequences shown above.
  • the method may include the following steps;
  • the method may include additional steps.
  • the method may include the step of oxidising and/or reducing the sample prior to bisulfite treatment.
  • the bisulfite treatment of the reduced or oxidised sample may take place separately, or with the sample mixed together.
  • the method may include the steps of;
  • nucleic acid adaptor having at least 4 chemically different cytosine bases, b) fragmenting a nucleic acid sample
  • the inventors have developed a set of molecular tag adaptors allowing the chemical exposure history of the molecules in a sample to be followed.
  • the adaptors have a region having more than one type of cytosine base.
  • the modified region may be the only region on the adaptor having a C base which is not 5-methylcytosine.
  • Standard adaptors for use in bisulfite sequencing are usually methylated, and are thus unaffected by bisulfite treatment. Cytosine bases in the adaptor become uracil bases upon bisulfite treatment. Thus the inclusion of both cytosine and methylated cytosine bases in the adaptor attached to a sample, a portion of which undergoes bisulfite treatment, allows identification of whether or not the particular molecules in the sample have undergone bisulfite treatment once the adaptors are sequenced. Similarly the use of hydroxymethylated C or formyl C bases in the adaptor allows identification of whether the samples have been oxidised or reduced prior to bisulfite treatment. The use of all four chemically different cytosine bases in the adaptor attached to a nucleic acid allows the exposure history of the strands in the sample to be followed in a single sequencing run.
  • the use of multiple tag index sequences allows a plurality of different samples to be analysed in parallel. Thus for example a number of different samples from different biological origins can be processed in parallel.
  • the concept of indexing samples using different molecular sequences on adaptors is known, and can be applied in context. Thus the use of say 24 different adaptors having a different order of A, G, C and T bases, where each of the 24 adaptors has more than one type of C base allows the processing and bisulfite analysis of 24 samples to be achieved in a single sequencing run.
  • the concept of indexing is useful in areas where the analysis of small sized genomes is envisaged, for example in sequencing large numbers of microbial samples.
  • the use a single tag adaptor allows a single library sample to be split into sub aliquots and each sub aliquot processed through a different conversion chemistry.
  • a part of the sample can be oxidised, a further part of the sample reduced, these can be treated with bisulfite along with a further part of the sample, and the bisulfite treated samples can be pooled with a further untreated part and sequenced.
  • the different conversion chemistry causes different conversions to happen to the different cytosine bases within the tag sequence. Thus what has happed to each molecule can be seen once the tags are sequenced.
  • the separately processed samples can be pooled together for sequence analysis.
  • the disclosure includes a nucleic acid adaptor having a tag sequence having at least two different cytosine bases, including one or more modified cytosine bases.
  • the disclosure includes an adaptor having A, G, T and 5-methylC bases apart from a tag region where non- methylated C bases (C, HMC or FC bases) are present.
  • the tag sequence has a first cytosine base selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine and a second cytosine base has a different nucleotide selected from cytosine, 5- methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine.
  • the term different in this context means chemically or structurally different, not just in an alternative location or of alternate sequence.
  • the disclosure includes a nucleic acid adaptor having a tag sequence having at least three chemically different cytosine bases, including two or more modified cytosine bases.
  • the tag sequence has a first cytosine base which is unmodified cytosine, a second cytosine base which is 5-methylcytosine, and a third cytosine base which has a nucleotide selected from 5- formylcytosine or 5-hydroxymethylcytosine.
  • the term different in this context means chemically or structurally different, not just in an alternative location or of alternative sequence.
  • the tag may have at least 4 cytosine bases, including 4 chemically different cytosine bases.
  • the four different cytosine bases may be a cytosine base, a 5-methylcytosine base, a 5- formylcytosine base and a 5-hydroxymethylcytosine base.
  • the tag sequence on the adaptor may be 5-20 bases in length.
  • the tag sequence on the adaptor may be 6-12 bases in length.
  • the tag sequence may be 10 bases in length.
  • the design of the tag may be such that the different cytosine bases are not adjacent in the sequence.
  • the tag may be of type C1XC2XC3XC4X where Cl-4 are the different cytosines, and X is one or more nucleotides selected from T, A or G. X can be such that the gap must contain a purine (A or G).
  • the adaptor can be constructed such that the cytosine bases are separated by at least one purine.
  • Sequences for the tag sequences may be selected from one or more of the sequences listed below:
  • each C is either a cytosine base, a 5-methylcytosine base, a 5- formylcytosine base and a 5-hydroxymethylcytosine base such that each sequence has one of each different type of modification.
  • the modifications can be in any order such that any of the C bases can be any modification, providing all four modifications are present in any 10- mer, and no sequence has two chemically identical C bases.
  • the tag sequence may be part of a larger oligonucleotide.
  • the tagged adaptor may also have a further region for hybridising a primer.
  • the larger oligonucleotide adaptor may be part of a single stranded or double stranded adaptor attached to the end of the nucleic acid fragments to be analysed.
  • the adaptors may therefore contain G, A, T and methyl C bases, with the tag regions only have C, formyl C or hydroxymethyl C bases.
  • the invention also includes a nucleic acid sample labelled with a nucleic acid adaptor according as herein described.
  • the sample may be fragmented prior to attachment of the tagged adaptor.
  • the population of nucleic acid molecules may be a sample of DNA or RNA, for example a genomic DNA sample.
  • Suitable DNA and RNA samples may be obtained or isolated from a sample of cells, for example, mammalian cells such as human cells or tissue samples, such as biopsies.
  • the sample may be obtained from a formalin fixed parafin embedded (FFPE) tissue sample.
  • FFPE formalin fixed parafin embedded
  • the population may be a diverse population of nucleic acid molecules, for example a library, such as a whole genome library or a loci specific library.
  • Nucleic acid strands in the population may be amplified nucleic acid molecules, for example, amplified fragments of the same genetic locus or region from different samples.
  • Nucleic acid strands in the population may be enriched.
  • the population may be an enriched subset of a sample produced by pull-down onto a hybridisation array or digestion with a restriction enzyme.
  • the samples having the tagged adaptors may be further processed, for example by amplification or sequencing.
  • the joined oligonucleotides may be copied using a nucleic acid polymerase. If adaptors are attached to both ends of the target fragments, the population of fragments can be amplified using a single pair of primers complementary to the adaptors.
  • the tags can also be used to help identify sequences from different sources. If adaptors are used with different sequences for different sources of biological materials, then the different sources can be pooled but still identified via the tag when the tags are sequenced. Thus the disclosure herein includes the use of two or more different populations of adaptors for the multiplexing of the analysis of different samples. Disclosed herein therefore are kits containing two or more adaptors of different sequence.
  • the sequence of the adaptor oligonucleotide depends on the specific application and suitable adaptor oligonucleotides may be designed using known techniques.
  • a suitable adaptor oligonucleotide may, for example, consist of 20 to 100 nucleotides.
  • the sequence of the adaptor may be selected to be complementary to a suitable amplification/extension primer.
  • the method may be used in order to prepare samples for nucleic acid sequencing.
  • the method may be used to sequence a population of synthetic oligonucleotides, for example for the purposes of quality control.
  • the first oligonucleotides may come from a population of nucleic acid molecules from a biological sample.
  • the population may be fragments of between 100-10000 nucleotides in length.
  • the fragments may be 200-1000 nucleotides in length.
  • the fragments may be of random variable sequence.
  • the order of bases in the sequence may be known, unknown, or partly known.
  • the fragments may come from treating a biological sample to obtain fragments of shorter length than exist in the naturally occurring sample.
  • the fragments may come from a random cleavage of longer strands.
  • the fragments may be derived from shearing the sample using a physical method such as hydrodynamic shearing.
  • the fragments may be derived from treating a nucleic acid sample with a chemical reagent (for example sodium bisulfite, acid or alkali) or enzyme (for example with a restriction endonuclease or other nuclease).
  • the fragments may come from a treatment step that causes double stranded molecules to become single stranded.
  • Methods of the invention may be useful in preparing a population of nucleic acid strands for sequencing, for example a population of bisulfite-treated single- stranded nucleic acid fragments.
  • Bisulfite treatment produces single-stranded nucleic acid fragments, typically of about 250-1000 nucleotides in length.
  • the population may be treated with bisulfite by incubation with bisulfite ions (HS0 3 2 ).
  • bisulfite ions HSU3 2
  • the use of bisulfite ions (HSO3 2 ) to convert unmethylated cytosines in nucleic acids into uracil is standard in the art and suitable reagents and conditions are well known.
  • the methods disclosed may further include the step of producing one or more copies of the first single stranded oligonucleotides.
  • the methods may include producing multiple copies of each of the different sequences.
  • the copies may be made by hybridising a primer sequence opposite a universal sequence on the second oligonucleotide sequence, and using a nucleic acid polymerase to synthesise a complementary copy of the first single stranded sequences.
  • the production of the complementary copy provides a double stranded polynucleotide.
  • the double stranded polynucleotides can be amplified using primers complementary to both strands.
  • the amplification can be locus-specific. Locus specific amplification only amplifies a selection of the fragments in the pool and is therefore a selective amplification for certain sequences.
  • adaptor sequences can be attached to both ends of the fragments. The attachment of known adaptors at both ends of each fragment can allow amplification of all the fragments in the pool as each fragment possesses two universal ends.
  • double stranded polynucleotides may be made circular by attaching the ends together.
  • double stranded molecules produced by extension of a primer annealed to the adaptor sequence may be circularised by ligation. This may be useful in the generation of circular nucleic acid constructs and plasmids or in the preparation of samples for sequencing using platforms that employ circular templates (e.g. PacBio SMRT sequencing).
  • populations of circularised 3' adapted nucleic acid fragments produced as described herein may be denatured and subjected to rolling circle or whole genome amplification using an amplification primer that hybridises to the 3 '-adaptor oligonucleotide to produce a population of concatomeric products. Amplification of circular fragments can be carried out using primers complementary to two regions of the single adaptor sequence.
  • Random priming is used in techniques such as whole genome amplification (WGA). Having a universal primer on one end of a population of single stranded fragments and a random primer on the opposite end means that amplification is more efficient that having random primers on both ends, as is the case with WGA.
  • the tagged adaptor joined fragments can be used in any subsequent method of sequence determination.
  • the fragments can undergo parallel sequencing on a solid support.
  • the attachment of universal adaptors to each end may be beneficial in the amplification of the population of fragments.
  • Suitable sequencing methods are well known in the art, and include Illumina sequencing, pyrosequencing (for example 454 sequencing) or Ion Torrent sequencing from Life TechnologiesTM).
  • Populations of nucleic acid molecules with a 3' adaptor oligonucleotide and optionally a 5' second adaptor oligonucleotide may be sequenced directly.
  • the sequences of the first and second adaptor oligonucleotides may be specific for a sequencing platform.
  • they may be complementary to the flowcell or device on which sequencing is to be performed. This may allow the sequencing of the population of nucleic acid fragments without the need for further amplification and/or adaptation.
  • the first and second adaptor sequences are different.
  • the adaptor sequences and tag sequences are not found within the human genome.
  • the nucleic acid strands in the population may have the same first adaptor sequence at their 3' ends and the same second adaptor sequence at their 5' ends i.e. all of the fragments in the population may be flanked by the same pair of adaptor sequences. In such cases both strands in the duplex carries a tag sequence.
  • Suitable adaptor oligonucleotides for the production of nucleic acid strands for sequencing may include a region that is complementary to the universal primers on the solid support (e.g. a flowcell or bead) and a region that is complementary to universal sequencing primers (i.e. which when annealed to the adaptor oligonucleotide and extended allows the sequence of the nucleic acid molecule to be read).
  • Suitable nucleotide sequences for these interactions are well known in the art and depend on the sequencing platform to be employed. Suitable sequencing platforms include Illumina TruSeq, LifeTech IonTorrent, Roche 454 and PacBio RS.
  • the sequences of the first and second adaptor oligonucleotides may comprise a sequence that hybridises to complementary primers immobilised on the solid support (e.g. 20- 30 nucleotides); a sequence that hybridises to sequencing primer (e.g. 30-40 nucleotides) and a unique index sequence (e.g. 6-10 nucleotides).
  • Suitable first and second adaptor oligonucleotides may be 56-80 nucleotides in length.
  • the adaptors may be configured as single strands containing both DNA and RNA, or as two or three strands.
  • the nucleic acid molecules may be purified by any convenient technique. Following preparation, the population of nucleic acid molecules may be provided in a suitable form for further treatment as described herein. For example, the population of nucleic acid molecules may be in aqueous solution in the absence of buffers before treatment as described herein.
  • populations of nucleic acid molecules with a 3' adaptor oligonucleotide and optionally a 5' adaptor oligonucleotide may be further adapted and/or amplified as required, for example for a specific application or sequencing platform.
  • the nucleic acid strands in the population may have the same first adaptor sequence at their 3' ends and the same second adaptor sequence at their 5' ends i.e. all of the fragments in the population may be flanked by the same pair of adaptors, as described above. This allows the same pair of amplification primers to amplify all of the strands in the population and avoids the need for multiplex amplification reactions using complex sets of primer pairs, which are susceptible to mis-priming and the amplification of artefacts.
  • Suitable first and second amplification primers may be 20-25 nucleotides in length and may be designed and synthesised using standard techniques.
  • a first amplification primer may hybridise to the first adaptor sequence i.e. the first amplification primer may comprise a nucleotide sequence complementary to the first adaptor oligonucleotide; and a second amplification primer may hybridises to the complement of second adaptor sequence i.e. the second amplification primer may comprise the nucleotide sequence of the second adaptor oligonucleotide.
  • a first amplification primer may hybridise to the complement of first adaptor sequence i.e.
  • the first amplification primer may comprise a nucleotide sequence of the first adaptor oligonucleotide; and a second amplification primer may hybridise to the second adaptor sequence i.e. the second amplification primer may comprise the nucleotide sequence of the second adaptor oligonucleotide.
  • the first and second amplification primers may incorporate additional sequences. Additional sequences may include index sequences to allow identification of the amplification products during multiplex sequencing, or further adaptor sequences to allow sequencing of the strands using a specfic sequencing platform.
  • a portion of the nucleic acid sample may be oxidised using an oxidising agent.
  • the oxidising agent may be a non-enzymatic oxidising agent, for example, an organic or inorganic chemical compound.
  • Suitable oxidising agents are well known in the art and include metal oxides, such as KRu0 4 , Mn0 2 and KMn0 4 .
  • Particularly useful oxidising agents are those that may be used in aqueous conditions, which are most convenient for the handling of the polynucleotide. However, oxidising agents that are suitable for use in organic solvents may also be employed where practicable.
  • the oxidising agent may comprise a perruthenate anion (Ru0 4 ).
  • Suitable perruthenate oxidising agents include organic and inorganic perruthenate salts, such as potassium perruthenate (KRu0 4 ) and other metal perruthenates; tetraalkyl ammonium perruthenates, such as tetrapropylammonium perruthenate (TPAP) and tetrabutylammonium perruthenate (TBAP); polymer-supported perruthenate (PSP) and tetraphenylphosphonium ruthenate.
  • the oxidising agents may be a metal (VI) oxo complex.
  • the oxidising agent may be manganate (Mn(VI)0 4 2" ), ferrate (Fe(VI)0 4 2" ), osmate (Os(VI)0 4 2” ), ruthenate (Ru(VI)0 4 2” ), or molybdate (Mo(VI)0 4 2" ).
  • the oxidising agent or the oxidising conditions may also preserve the polynucleotide in a denatured state.
  • the polynucleotides in the first portion may be purified.
  • nucleic acid purification may be performed using any convenient nucleic acid purification technique. Suitable nucleic acid purification techniques include spin-column chromatography.
  • the polynucleotide may be subjected to further, repeat oxidising steps. Such steps are undertaken to maximise the conversion of 5-hydroxycytosine to 5-formylcytosine. This may be necessary where a polynucleotide has sufficient secondary structure that is capable of re- annealing. Any annealed portions of the polynucleotide may limit or prevent access of the oxidising agent to that portion of the structure, which has the effect of protecting 5-hydroxycytosine from oxidation.
  • the portion of the population of polynucleotides may for example be subjected to multiple cycles of treatment with the oxidising agent followed by purification. For example, one, two, three or more than three cycles may be performed.
  • a portion of the population of polynucleotides comprising the sample nucleotide sequence may be reduced. In other embodiments, a further portion of the population of polynucleotides comprising the sample nucleotide sequence may be reduced.
  • Reduction converts 5-formylcytosine residues in the sample nucleotide sequence into 5- hydroxymethylcytosine.
  • the portions of polynucleotides may be reduced by treatment with a reducing agent.
  • the reducing agent is any agent suitable for generating an alcohol from an aldehyde.
  • the reducing agent or the conditions employed in the reduction step may be selected so that any 5-formylcytosine is selectively reduced (i.e.
  • the reducing agent or reduction conditions are selective for 5-formylcytosine). Thus, substantially no other functionality in the polynucleotide is reduced in the reduction step.
  • the reducing agent or conditions are selected to minimise or prevent any degradation of the polynucleotide.
  • Suitable reducing agents are well-known in the art and include NaBH 4 , NaCNBH 4 and LiBH . Particularly useful reducing agents are those that may be used in aqueous conditions, as such are most convenient for the handling of the polynucleotide. However, reducing agents that are suitable for use in organic solvents may also be employed where practicable.
  • the reduced and oxidised portion of the population are treated with bisulfite.
  • a second portion of the population which has not been oxidised or reduced is also treated with bisulfite.
  • the bisulfite treatment can be done separately on the three samples, or the samples can be pooled so that the reduced, oxidised and untreated sample and all exposed to bisulfite in the same reaction.
  • Bisulfite treatment converts both cytosine and 5-formylcytosine residues in a polynucleotide into uracil. Where any 5-carboxycytosine is present (as a product of the oxidation step), this 5-carboxycytosine is converted into uracil in the bisulfite treatment. Without wishing to be bound by theory, it is believed that the reaction of the 5-formylcytosine proceeds via loss of the formyl group to yield cytosine, followed by a subsequent deamination to give uracil. The 5-carboxycytosine is believed to yield the uracil through a sequence of decarboxylation and deamination steps. Bisulfite treatment may be performed under conditions that convert both cytosine and 5-formylcytosine or 5-carboxycytosine residues in a polynucleotide as described herein into uracil.
  • a portion of the population may be treated with bisulfite by incubation with bisulfite ions (HSO3 2 ).
  • bisulfite ions HSO3 2
  • the use of bisulfite ions (HSO3 2 ) to convert unmethylated cytosines in nucleic acids into uracil is standard in the art and suitable reagents and conditions are well known to the skilled person. Numerous suitable protocols and reagents are also commercially available (for example, EpiTect , Qiagen L; EZ DNA Methyl ationTM Zymo Research Corp CA; CpGenome Turbo Bisulfite Modification Kit; Millipore).
  • kits comprising multiple different sequences where each adaptor has two or more different cytosines.
  • kits comprising a first nucleic acid adaptor having a tag sequence of 5-20 bases including at least two different cytosine bases in the tag and a second nucleic acid adaptor having a different tag sequence of 5-20 bases including at least two different cytosine bases in the tag.
  • the kit may have more than 2 adaptors.
  • the kit may have adaptors with at least 4 different tag sequences, each having at least two different cytosine bases in the tag.
  • the kit may have adaptors with at least 10 different tag sequences, each having at least two different cytosine bases in the tag.
  • the kit may have adaptors with at least 24 different tag sequences, each having at least two different cytosine bases in the tag.
  • the sequences of the tags may be selected from the sequences shown below:
  • each C base in each tag is different; one is C, one is methyl C, one is hydroxymethyl C and one is formyl C.
  • the kits may contain one of more nucleic acid acting enzymes such as nucleic acid polymerases or ligases.
  • the adapter kits may contain a further nucleic acid complementary to a region of the first adapter such that the adaptor is at least partly double stranded.
  • the tag sequences may be selected using the protocol below, or modifications thereto:
  • Each tag must contain at least 4 cytosines to represent each modification (C, mC, hmC, fC).
  • the tags may be incorporated into adaptors used to attached to nucleic acid fragments for sequencing.
  • a method of using the tagged adaptors for determining the methylation profile of a nucleic acid sample may include the following steps; a) preparing a nucleic acid adaptor having a tag sequence having at least two cytosine bases, including one or more modified cytosine bases, wherein a first cytosine base has a nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5- hydroxymethylcytosine and a second cytosine base has a different nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine,
  • the method may include additional steps.
  • the method may include the step of oxidising and/or reducing the sample prior to bisulfite treatment.
  • the bisulfite treatment of the reduced or oxidised sample may take place separately, or with the sample mixed together.
  • the method may include the steps of;
  • nucleic acid adaptor having at least 4 chemically different cytosine bases, b) fragmenting a nucleic acid sample
  • the method may include the step of oxidising the sample prior to bisulfite treatment.
  • the method may include the steps of;
  • the oxidising agent may be a non-enzymatic oxidising agent, for example, an organic or inorganic chemical compound.
  • Suitable oxidising agents are well known in the art and include metal oxides, such as KRu0 4 , Mn0 2 and KMn0 4 .
  • Particularly useful oxidising agents are those that may be used in aqueous conditions, which are most convenient for the handling of the polynucleotide.
  • oxidising agents that are suitable for use in organic solvents may also be employed where practicable. In such cases the three different cytosine bases may be cytosine, 5-methylcytosine and 5-hydroxymethylcytosine.
  • the oxidising agent may comprise a perruthenate anion (Ru0 4 ).
  • Suitable perruthenate oxidising agents include organic and inorganic perruthenate salts, such as potassium perruthenate (KRu0 4 ) and other metal perruthenates; tetraalkyl ammonium perruthenates, such as tetrapropylammonium perruthenate (TPAP) and tetrabutylammonium perruthenate (TBAP); polymer-supported perruthenate (PSP) and tetraphenylphosphonium ruthenate.
  • the oxidising agents may be a metal (VI) oxo complex.
  • the oxidising agent may be manganate (Mn(VI)0 4 2" ), ferrate (Fe(VI)0 4 2” ), osmate (Os(VI)0 4 2” ), ruthenate (Ru(VI)0 4 2” ), or molybdate (Mo(VI)0 4 2" ).
  • the method may include the step of reducing the sample prior to bisulfite treatment. The method may include the steps of;
  • the reducing agent may be borohydride.
  • Suitable reducing agents include NaBH 4 , NaCNBH 4 and LiBH 4 .
  • the three different cytosine bases may be cytosine, 5- methylcytosine and 5-formylcytosine.
  • CEG04 41 4 400 ng each) and one third (CEG04 41 4) was left native, one third (CEG04 41 5) was converted through the bisulfite only half of the CEGX TrueMethyl kit (as per manufacturers protocol) and a third (CEG04 41 6) was converted through the oxidative bisulfite half of the CEGX TrueMethyl kit. (as per manufacturers protocol). All samples were amplified using the PCR protocol in the TrueMethyl kit using the TrueMethyl polymerase. Amplicons were quantified and pooled at equimolar concentrations and used to prepare a 2nM library solution to take forward for Illumina SBS sequencing.
  • This experiment demonstrates the use of convertible index tags to uniquely differentiate between samples derived from a common library processed using different conversion chemistries. Reads processed through BS, oxBS or untreated can be unambiguously deconvolved from a complex pool of indexed fragments.
  • Table 2 Summary of the mapping efficiencies of the Native, BS and oxBS treated samples

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The inventors have developed a set of molecular tag adaptors allowing the history of a sample to be followed. The adaptors have a region having more than one type of cytosine base.

Description

Improved nucleic acid sample analysis using convertible tags
This invention relates to the preparation of nucleic acid samples for analysis.
Many methods exist for the preparation of samples of double-stranded DNA, for example for sequencing (e.g. Illumina TruSeq and NextEra, 454, NEBnext, Life Technologies etc).
However, the preparation of single-stranded DNA samples is more challenging because single stranded DNA molecules cannot be efficiently ligated together enzymatically. Reported workflows for the preparation of single-stranded DNA rely on the use of primers with degenerate sequences that "randomly prime" the single-stranded DNA and allow a truncated version of the parent DNA molecule to be adapted (for example, Epigenome™ Methyl-Seq kit, Epicentre Technologies WI USA). Methods using RNA ligase or CircLigase to join ends of single stranded DNA together have been reported but suffer from poor efficiency or are limited to the size of DNA fragments that can be ligated together.
Single stranded sample preparation is commonly used following bisulfite conversion of DNA molecules. The bisulfite conversion process necessarily results in the formation of single stranded DNA, and therefore involves either i) pre-bisulfite sample preparation or ii) post- bisulfite sample preparation employing random priming for downstream analysis. Drawbacks to these methods include the potential to generate nicked or fragmented libraries incapable of subsequent amplification, the loss of sequence information from the parent DNA molecules, generation of artefacts that contaminate the sample of interest or induce significant representation bias of reads in the final dataset.
Bisulfite sequencing allows 5-methylcytosine to be distinguished from the unmethylated cytosine. In addition to 5-methyl cytosine, other cytosine modifications including 5- hydroxymethyl and 5-formyl have been identified. In order to differentiate between these different cytosine modifications, techniques involving oxidation and/or reduction of the samples prior to bisulfite sequencing have been developed. In order to extract value from bisulfite sequencing, the sequencing output must be compared with a sample which has not undergone bisulfite treatment. Both 5-formylcytosine (5fC) and cytosine (C) are converted to uracil upon bisulfite treatment. Reduction of the formyl group to hydroxymethyl C (hmC) prior to bisulfite treatment allows C and 5fc to be identified. 5-Methylcytosine (5mC) and 5- hydroxymethylcytosine (5hmC) are not affected by bisulfite. Oxidation of the hydroxymethyl group to a formyl group allows the two to be differentiated. A summary of the relevant transformations is shown below:
Figure imgf000003_0002
Table 1
The structures of the bases is shown below:
Figure imgf000003_0001
In order to differentiate between the up to four different types of cytosine bases, up to four separate sequencing runs are required on the same sample. In order maximise sequencing throughput, it is advantageous if the treatment process carried out on each molecule can be identified such that the sequencing can be carried out in a single process.
Summary of the invention
The inventors have developed a set of molecular tag adaptors allowing the history of a sample to be followed. The adaptors have a region having more than one type of cytosine base. The use of standard adaptors without different cytosine bases does not allow the history of the sample to be followed as the adaptor sequences are not traceable. The tag indexes of the invention, on treatment with BS, oxBS or redBS become unique post conversion markers (convertible tags) and show what has happened to the sample. The use of multiple tag index sequences allows a plurality of different samples to be analysed in parallel. The use a single tag adaptor allows a single library sample to be split into sub aliquots and each sub aliquot processed through a different conversion chemistry. The different conversion chemistry causes different conversions to happen to the different cytosine bases within the tag sequence. Thus the profile of each molecule in the sequencing run can be determined from the resultant tag sequence, which changes depending on the history of chemical exposure. The separately processed samples can thus be pooled together for sequence analysis. Each treated sample can be unambiguously resolved from the pool, and demultiplexed into separate bins determined by sample and conversion type. The benefits of the tags include reducing the cost of library construction and sequencing. A single library serves all conversion chemistries, there is no need for any bisulfite specific sample preparation steps. Further advantages include reducing the sources of technical errors and variability induced by the vagaries of library construction. All converted samples share the same, identical starting library, eliminating any library to library construction differences.
An example of one particular tag sequence of the invention is shown below. The sequence is a 10-mer with 4 cytosine bases, one of each type (C, mC, hmC and fC). When the single sequence is treated under different conditions, the sequence changes in different ways reflecting the conditions to which the sample has been exposed.
Figure imgf000005_0001
Table 1. Demonstrating the conversion effect of the BS, oxBS and redBS treatments on an example convertible tag.
The disclosure includes a nucleic acid adaptor having a tag sequence having at least two different cytosine bases, including one or more modified cytosine bases. The tag sequence has a first cytosine base selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5- hydroxymethylcytosine and a second cytosine base has a different nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine. The term different in this context means chemically or structurally different, not just in an alternative location or of alternative sequence.
The disclosure includes a nucleic acid adaptor having a tag sequence having at least three chemically different cytosine bases, including two or more modified cytosine bases. The tag sequence has a first cytosine base which is unmodified cytosine, a second cytosine base which is 5-methylcytosine, and a third cytosine base which has a nucleotide selected from 5- formylcytosine or 5-hydroxymethylcytosine. The term different in this context means chemically or structurally different, not just in an alternative location or of alternative sequence.
The adaptor may have at least 4 cytosine bases, including 4 chemically different cytosine bases. The four different cytosine bases may be a cytosine base, a 5-methylcytosine base, a 5- formylcytosine base and a 5-hydroxymethylcytosine base.
The tag sequence on the adaptor may be 5-20 bases in length. The tag sequence on the adaptor may be 6-12 bases in length. The tag sequence may be 10 bases in length. The design of the tag may be such that the different cytosine bases are not adjacent in the sequence. The tag may be of type C1XC2XC3XC4X where Cl-4 are the different cytosines, and X is one or more nucleotides selected from T, A or G. X can be such that the sequence contain a purine (A or G). The adaptor can be constructed such that each of the cytosine bases are separated by at least one purine.
Sequences for the tag sequences may be selected from one or more of the sequences listed below:
CAGCAGCGTC
CTGCGACGAC
GCAGCGACGC
GCGACGCAGC
CGCAGCAGCG
CACTGCGACG
CAGCGTCAGC
CGACTGCGAC
TCGACGTCAC
CGACGACGCG
CGTCAGCAGC
GCTGCAGCAC
GCACGCAGCG
CAGCGCAGCT
CACGACGTCT
TCACTGCAGC
GCGACACGCG
CTGCAGCACT
TCACGACGTC
GCGCAGCGCA
ACGTCGCGAC
ACGCAGCTGC
TCTCAGCGAC
CGACACTGCT
In the sequences listed, each C is either a cytosine base, a 5-methylcytosine base, a 5- formylcytosine base and a 5-hydroxymethylcytosine base such that each sequence has one of each different type of modification. The modifications can be in any order such that any of the C bases can be any modification, providing all four modifications are present in any 10- mer, and no sequence has two identical C bases.
The tag sequence may be part of a larger oligonucleotide. For example, the tagged adaptor may also have a further region for hybridising a primer. The larger oligonucleotide adaptor may be part of a single stranded or double stranded adaptor attached to the end of the nucleic acid fragments to be analysed. In order to be used in bisulfite sequencing, the adaptor sequences may contain methylated C bases instead of C bases. Conventional C nucleotides become uracil bases upon bisulfite treatment, and it may be advantageous to avoid transforming the adaptors with bisulfite. The adaptors may therefore contain G, A, T and methyl C bases, with the tag regions only having C, formyl C or hydroxymethyl C bases. Examples of types of adaptors carrying tag sequences (shown as 9 or 10-mers in the examples, not to be limited to 9 or 10-mers in reality), are shown in the example below. As can be seen, the adaptors having the tags are larger than just the tags. The term tagged adaptors refers to the adaptors having the tag sequences included therein.
Figure imgf000007_0001
Each cytosine bases is shown as '5' in the sequences above, indicating 5-methylcytosine. The adaptors are 'forked adaptors' having a region of double stranded sequence and two regions of single stranded sequence. Each adaptor contains a tag in one single stranded region of the 'fork' . Thus each adaptor in the example above contains a first strand where every C base is 5-methyl C, and a second strand where every C base except the C bases in the tag is 5-methyl C. The term adaptor can apply to either the single stranded oligonucleotides or the hybridised pair of oligonucleotides. The invention also includes a nucleic acid sample labelled with a nucleic acid adaptor according as herein described. The sample may be fragmented prior to attachment of the tagged adaptor.
Also included are kits comprising multiple different sequences where each adaptor has two or more different cytosines. Included are kits comprising a first nucleic acid adaptor having a tag sequence of 5-20 bases including at least two different cytosine bases in the tag and a second nucleic acid adaptor having a different tag sequence of 5-20 bases including at least two different cytosine bases in the tag. The kit may have more than 2 adaptors. For example the kit may have adaptors with at least 4 different tag sequences, each having at least two different cytosine bases in the tag. The kit may have adaptors with at least 10 different tag sequences, each having at least two different cytosine bases in the tag. The kit may have adaptors with at least 24 different tag sequences, each having at least two different cytosine bases in the tag. The sequences of the tags may be selected from the sequences shown above.
Also included are kits comprising multiple different sequences where each adaptor has three or more different cytosines. Included are kits comprising a first nucleic acid adaptor having a tag sequence of 5-20 bases including at least three different cytosine bases in the tag and a second nucleic acid adaptor having a different tag sequence of 5-20 bases including at least three different cytosine bases in the tag. The kit may have more than 2 adaptors. For example the kit may have adaptors with at least 4 different tag sequences, each having at least three different cytosine bases in the tag. The kit may have adaptors with at least 10 different tag sequences, each having at least three different cytosine bases in the tag. The kit may have adaptors with at least 24 different tag sequences, each having at least three different cytosine bases in the tag. The sequences of the tags may be selected from the sequences shown above.
Disclosed is a method of using the adaptors for determining the methylation profile of a nucleic acid sample. The method may include the following steps;
a) preparing a nucleic acid adaptor having a tag sequence having at least two cytosine bases, including one or more modified cytosine bases, wherein a first cytosine base has a nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5- hydroxymethylcytosine and a second cytosine base has a different nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine, b) fragmenting a nucleic acid sample,
c) attaching the adaptor to the nucleic acid fragments to produce tagged fragments, d) treating a first portion of the tagged fragments with bisulfite,
e) mixing the bisulfite treated fragments with untreated fragments, and
f) sequencing the mixture of fragments, including the tags.
The method may include additional steps. The method may include the step of oxidising and/or reducing the sample prior to bisulfite treatment. The bisulfite treatment of the reduced or oxidised sample may take place separately, or with the sample mixed together. The method may include the steps of;
a) preparing a nucleic acid adaptor having at least 4 chemically different cytosine bases, b) fragmenting a nucleic acid sample,
c) attaching the adaptor to the nucleic acid fragments to produce tagged fragments, d) treating a first portion of the tagged fragments with an oxidising agent,
e) treating a second portion of the tagged fragments with a reducing agent,
f) treating the oxidised first portion, reduced second portion and a third portion with bisulfite
g) mixing the bisulfite treated portions with untreated fragments; and
h) sequencing the mixture of fragments, including the tags.
Detailed Description
The inventors have developed a set of molecular tag adaptors allowing the chemical exposure history of the molecules in a sample to be followed. The adaptors have a region having more than one type of cytosine base. The modified region may be the only region on the adaptor having a C base which is not 5-methylcytosine.
Standard adaptors for use in bisulfite sequencing are usually methylated, and are thus unaffected by bisulfite treatment. Cytosine bases in the adaptor become uracil bases upon bisulfite treatment. Thus the inclusion of both cytosine and methylated cytosine bases in the adaptor attached to a sample, a portion of which undergoes bisulfite treatment, allows identification of whether or not the particular molecules in the sample have undergone bisulfite treatment once the adaptors are sequenced. Similarly the use of hydroxymethylated C or formyl C bases in the adaptor allows identification of whether the samples have been oxidised or reduced prior to bisulfite treatment. The use of all four chemically different cytosine bases in the adaptor attached to a nucleic acid allows the exposure history of the strands in the sample to be followed in a single sequencing run.
The use of standard adaptors without different cytosine bases does not allow the history of the sample to be followed as the adaptor sequences are not traceable. The tag indexes, on treatment with BS, oxBS or redBS become unique post conversion (convertible tags) and show what has happened to the sample.
The use of multiple tag index sequences allows a plurality of different samples to be analysed in parallel. Thus for example a number of different samples from different biological origins can be processed in parallel. The concept of indexing samples using different molecular sequences on adaptors is known, and can be applied in context. Thus the use of say 24 different adaptors having a different order of A, G, C and T bases, where each of the 24 adaptors has more than one type of C base allows the processing and bisulfite analysis of 24 samples to be achieved in a single sequencing run. The concept of indexing is useful in areas where the analysis of small sized genomes is envisaged, for example in sequencing large numbers of microbial samples.
The use a single tag adaptor allows a single library sample to be split into sub aliquots and each sub aliquot processed through a different conversion chemistry. For example a part of the sample can be oxidised, a further part of the sample reduced, these can be treated with bisulfite along with a further part of the sample, and the bisulfite treated samples can be pooled with a further untreated part and sequenced. The different conversion chemistry causes different conversions to happen to the different cytosine bases within the tag sequence. Thus what has happed to each molecule can be seen once the tags are sequenced. The separately processed samples can be pooled together for sequence analysis. Each treated sample can then be unambiguously resolved from the pool, and demultiplexed into separate bins determined by sample and conversion type. The benefits of the tags include reducing the cost of library construction. A single library serves all conversion chemistries. Further advantages include reducing the sources of technical errors and variability induced by the vagaries of library construction. All converted samples share the same, identical starting library, eliminating any library to library construction differences. The disclosure includes a nucleic acid adaptor having a tag sequence having at least two different cytosine bases, including one or more modified cytosine bases. The disclosure includes an adaptor having A, G, T and 5-methylC bases apart from a tag region where non- methylated C bases (C, HMC or FC bases) are present. The tag sequence has a first cytosine base selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine and a second cytosine base has a different nucleotide selected from cytosine, 5- methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine. The term different in this context means chemically or structurally different, not just in an alternative location or of alternate sequence.
The disclosure includes a nucleic acid adaptor having a tag sequence having at least three chemically different cytosine bases, including two or more modified cytosine bases. The tag sequence has a first cytosine base which is unmodified cytosine, a second cytosine base which is 5-methylcytosine, and a third cytosine base which has a nucleotide selected from 5- formylcytosine or 5-hydroxymethylcytosine. The term different in this context means chemically or structurally different, not just in an alternative location or of alternative sequence.
The tag may have at least 4 cytosine bases, including 4 chemically different cytosine bases. The four different cytosine bases may be a cytosine base, a 5-methylcytosine base, a 5- formylcytosine base and a 5-hydroxymethylcytosine base.
The tag sequence on the adaptor may be 5-20 bases in length. The tag sequence on the adaptor may be 6-12 bases in length. The tag sequence may be 10 bases in length. The design of the tag may be such that the different cytosine bases are not adjacent in the sequence. The tag may be of type C1XC2XC3XC4X where Cl-4 are the different cytosines, and X is one or more nucleotides selected from T, A or G. X can be such that the gap must contain a purine (A or G). The adaptor can be constructed such that the cytosine bases are separated by at least one purine.
Sequences for the tag sequences may be selected from one or more of the sequences listed below:
CAGCAGCGTC CTGCGACGAC GCAGCGACGC
GCGACGCAGC
CGCAGCAGCG
CACTGCGACG
CAGCGTCAGC
CGACTGCGAC
TCGACGTCAC
CGACGACGCG
CGTCAGCAGC
GCTGCAGCAC
GCACGCAGCG
CAGCGCAGCT
CACGACGTCT
TCACTGCAGC
GCGACACGCG
CTGCAGCACT
TCACGACGTC
GCGCAGCGCA
ACGTCGCGAC
ACGCAGCTGC
TCTCAGCGAC
CGACACTGCT
In the sequences listed, each C is either a cytosine base, a 5-methylcytosine base, a 5- formylcytosine base and a 5-hydroxymethylcytosine base such that each sequence has one of each different type of modification. The modifications can be in any order such that any of the C bases can be any modification, providing all four modifications are present in any 10- mer, and no sequence has two chemically identical C bases.
The tag sequence may be part of a larger oligonucleotide. For example, the tagged adaptor may also have a further region for hybridising a primer. The larger oligonucleotide adaptor may be part of a single stranded or double stranded adaptor attached to the end of the nucleic acid fragments to be analysed. The adaptors may therefore contain G, A, T and methyl C bases, with the tag regions only have C, formyl C or hydroxymethyl C bases.
The invention also includes a nucleic acid sample labelled with a nucleic acid adaptor according as herein described. The sample may be fragmented prior to attachment of the tagged adaptor.
The population of nucleic acid molecules may be a sample of DNA or RNA, for example a genomic DNA sample. Suitable DNA and RNA samples may be obtained or isolated from a sample of cells, for example, mammalian cells such as human cells or tissue samples, such as biopsies. In some embodiments, the sample may be obtained from a formalin fixed parafin embedded (FFPE) tissue sample. Suitable cells include somatic and germ-line cells.
The population may be a diverse population of nucleic acid molecules, for example a library, such as a whole genome library or a loci specific library.
Nucleic acid strands in the population may be amplified nucleic acid molecules, for example, amplified fragments of the same genetic locus or region from different samples.
Nucleic acid strands in the population may be enriched. For example, the population may be an enriched subset of a sample produced by pull-down onto a hybridisation array or digestion with a restriction enzyme.
The samples having the tagged adaptors may be further processed, for example by amplification or sequencing. The joined oligonucleotides may be copied using a nucleic acid polymerase. If adaptors are attached to both ends of the target fragments, the population of fragments can be amplified using a single pair of primers complementary to the adaptors.
The tags can also be used to help identify sequences from different sources. If adaptors are used with different sequences for different sources of biological materials, then the different sources can be pooled but still identified via the tag when the tags are sequenced. Thus the disclosure herein includes the use of two or more different populations of adaptors for the multiplexing of the analysis of different samples. Disclosed herein therefore are kits containing two or more adaptors of different sequence.
The sequence of the adaptor oligonucleotide depends on the specific application and suitable adaptor oligonucleotides may be designed using known techniques. A suitable adaptor oligonucleotide may, for example, consist of 20 to 100 nucleotides. The sequence of the adaptor may be selected to be complementary to a suitable amplification/extension primer.
The method may be used in order to prepare samples for nucleic acid sequencing. The method may be used to sequence a population of synthetic oligonucleotides, for example for the purposes of quality control. Alternatively, the first oligonucleotides may come from a population of nucleic acid molecules from a biological sample. The population may be fragments of between 100-10000 nucleotides in length. The fragments may be 200-1000 nucleotides in length. The fragments may be of random variable sequence. The order of bases in the sequence may be known, unknown, or partly known. The fragments may come from treating a biological sample to obtain fragments of shorter length than exist in the naturally occurring sample. The fragments may come from a random cleavage of longer strands. The fragments may be derived from shearing the sample using a physical method such as hydrodynamic shearing. The fragments may be derived from treating a nucleic acid sample with a chemical reagent (for example sodium bisulfite, acid or alkali) or enzyme (for example with a restriction endonuclease or other nuclease). The fragments may come from a treatment step that causes double stranded molecules to become single stranded.
Methods of the invention may be useful in preparing a population of nucleic acid strands for sequencing, for example a population of bisulfite-treated single- stranded nucleic acid fragments. Bisulfite treatment produces single-stranded nucleic acid fragments, typically of about 250-1000 nucleotides in length. The population may be treated with bisulfite by incubation with bisulfite ions (HS03 2 ). The use of bisulfite ions (HSO32 ) to convert unmethylated cytosines in nucleic acids into uracil is standard in the art and suitable reagents and conditions are well known. Numerous suitable protocols and reagents are also commercially available (for example, EpiTect™, Qiagen NL; EZ DNA Methyl ation™ Zymo Research Corp CA; CpGenome Turbo Bisulfite Modification Kit, Millipore; TrueMethyl™. Cambridge Epigenetix, UK.
The methods disclosed may further include the step of producing one or more copies of the first single stranded oligonucleotides. The methods may include producing multiple copies of each of the different sequences. The copies may be made by hybridising a primer sequence opposite a universal sequence on the second oligonucleotide sequence, and using a nucleic acid polymerase to synthesise a complementary copy of the first single stranded sequences. The production of the complementary copy provides a double stranded polynucleotide.
The double stranded polynucleotides can be amplified using primers complementary to both strands. The amplification can be locus-specific. Locus specific amplification only amplifies a selection of the fragments in the pool and is therefore a selective amplification for certain sequences. Alternatively adaptor sequences can be attached to both ends of the fragments. The attachment of known adaptors at both ends of each fragment can allow amplification of all the fragments in the pool as each fragment possesses two universal ends.
Alternatively the double stranded polynucleotides may be made circular by attaching the ends together. In some embodiments, double stranded molecules produced by extension of a primer annealed to the adaptor sequence may be circularised by ligation. This may be useful in the generation of circular nucleic acid constructs and plasmids or in the preparation of samples for sequencing using platforms that employ circular templates (e.g. PacBio SMRT sequencing). In some embodiments, populations of circularised 3' adapted nucleic acid fragments produced as described herein may be denatured and subjected to rolling circle or whole genome amplification using an amplification primer that hybridises to the 3 '-adaptor oligonucleotide to produce a population of concatomeric products. Amplification of circular fragments can be carried out using primers complementary to two regions of the single adaptor sequence.
An alternative to locus specific amplification is the use of random priming. Random priming is used in techniques such as whole genome amplification (WGA). Having a universal primer on one end of a population of single stranded fragments and a random primer on the opposite end means that amplification is more efficient that having random primers on both ends, as is the case with WGA.
The tagged adaptor joined fragments can be used in any subsequent method of sequence determination. For example, the fragments can undergo parallel sequencing on a solid support. In such cases the attachment of universal adaptors to each end may be beneficial in the amplification of the population of fragments. Suitable sequencing methods are well known in the art, and include Illumina sequencing, pyrosequencing (for example 454 sequencing) or Ion Torrent sequencing from Life Technologies™).
Populations of nucleic acid molecules with a 3' adaptor oligonucleotide and optionally a 5' second adaptor oligonucleotide may be sequenced directly. For example, the sequences of the first and second adaptor oligonucleotides may be specific for a sequencing platform. For example, they may be complementary to the flowcell or device on which sequencing is to be performed. This may allow the sequencing of the population of nucleic acid fragments without the need for further amplification and/or adaptation. The first and second adaptor sequences are different. Preferably, the adaptor sequences and tag sequences are not found within the human genome.
The nucleic acid strands in the population may have the same first adaptor sequence at their 3' ends and the same second adaptor sequence at their 5' ends i.e. all of the fragments in the population may be flanked by the same pair of adaptor sequences. In such cases both strands in the duplex carries a tag sequence.
Suitable adaptor oligonucleotides for the production of nucleic acid strands for sequencing may include a region that is complementary to the universal primers on the solid support (e.g. a flowcell or bead) and a region that is complementary to universal sequencing primers (i.e. which when annealed to the adaptor oligonucleotide and extended allows the sequence of the nucleic acid molecule to be read). Suitable nucleotide sequences for these interactions are well known in the art and depend on the sequencing platform to be employed. Suitable sequencing platforms include Illumina TruSeq, LifeTech IonTorrent, Roche 454 and PacBio RS.
For example, the sequences of the first and second adaptor oligonucleotides may comprise a sequence that hybridises to complementary primers immobilised on the solid support (e.g. 20- 30 nucleotides); a sequence that hybridises to sequencing primer (e.g. 30-40 nucleotides) and a unique index sequence (e.g. 6-10 nucleotides). Suitable first and second adaptor oligonucleotides may be 56-80 nucleotides in length. The adaptors may be configured as single strands containing both DNA and RNA, or as two or three strands.
Following adaptation and/or labelling as described herein, the nucleic acid molecules may be purified by any convenient technique. Following preparation, the population of nucleic acid molecules may be provided in a suitable form for further treatment as described herein. For example, the population of nucleic acid molecules may be in aqueous solution in the absence of buffers before treatment as described herein.
In other embodiments, populations of nucleic acid molecules with a 3' adaptor oligonucleotide and optionally a 5' adaptor oligonucleotide, may be further adapted and/or amplified as required, for example for a specific application or sequencing platform. Preferably, the nucleic acid strands in the population may have the same first adaptor sequence at their 3' ends and the same second adaptor sequence at their 5' ends i.e. all of the fragments in the population may be flanked by the same pair of adaptors, as described above. This allows the same pair of amplification primers to amplify all of the strands in the population and avoids the need for multiplex amplification reactions using complex sets of primer pairs, which are susceptible to mis-priming and the amplification of artefacts.
Suitable first and second amplification primers may be 20-25 nucleotides in length and may be designed and synthesised using standard techniques. For example, a first amplification primer may hybridise to the first adaptor sequence i.e. the first amplification primer may comprise a nucleotide sequence complementary to the first adaptor oligonucleotide; and a second amplification primer may hybridises to the complement of second adaptor sequence i.e. the second amplification primer may comprise the nucleotide sequence of the second adaptor oligonucleotide. Alternatively, a first amplification primer may hybridise to the complement of first adaptor sequence i.e. the first amplification primer may comprise a nucleotide sequence of the first adaptor oligonucleotide; and a second amplification primer may hybridise to the second adaptor sequence i.e. the second amplification primer may comprise the nucleotide sequence of the second adaptor oligonucleotide.
In some embodiments, the first and second amplification primers may incorporate additional sequences. Additional sequences may include index sequences to allow identification of the amplification products during multiplex sequencing, or further adaptor sequences to allow sequencing of the strands using a specfic sequencing platform.
In some embodiments, a portion of the nucleic acid sample may be oxidised using an oxidising agent. The oxidising agent may be a non-enzymatic oxidising agent, for example, an organic or inorganic chemical compound. Suitable oxidising agents are well known in the art and include metal oxides, such as KRu04, Mn02 and KMn04. Particularly useful oxidising agents are those that may be used in aqueous conditions, which are most convenient for the handling of the polynucleotide. However, oxidising agents that are suitable for use in organic solvents may also be employed where practicable. In some embodiments, the oxidising agent may comprise a perruthenate anion (Ru04 ). Suitable perruthenate oxidising agents include organic and inorganic perruthenate salts, such as potassium perruthenate (KRu04) and other metal perruthenates; tetraalkyl ammonium perruthenates, such as tetrapropylammonium perruthenate (TPAP) and tetrabutylammonium perruthenate (TBAP); polymer-supported perruthenate (PSP) and tetraphenylphosphonium ruthenate. The oxidising agents may be a metal (VI) oxo complex. The oxidising agent may be manganate (Mn(VI)04 2"), ferrate (Fe(VI)04 2"), osmate (Os(VI)04 2"), ruthenate (Ru(VI)04 2" ), or molybdate (Mo(VI)04 2").
Advantageously, the oxidising agent or the oxidising conditions may also preserve the polynucleotide in a denatured state.
Following treatment with the oxidising agent, the polynucleotides in the first portion may be purified.
Purification may be performed using any convenient nucleic acid purification technique. Suitable nucleic acid purification techniques include spin-column chromatography.
The polynucleotide may be subjected to further, repeat oxidising steps. Such steps are undertaken to maximise the conversion of 5-hydroxycytosine to 5-formylcytosine. This may be necessary where a polynucleotide has sufficient secondary structure that is capable of re- annealing. Any annealed portions of the polynucleotide may limit or prevent access of the oxidising agent to that portion of the structure, which has the effect of protecting 5-hydroxycytosine from oxidation.
In some embodiments, the portion of the population of polynucleotides may for example be subjected to multiple cycles of treatment with the oxidising agent followed by purification. For example, one, two, three or more than three cycles may be performed.
In some embodiments, a portion of the population of polynucleotides comprising the sample nucleotide sequence may be reduced. In other embodiments, a further portion of the population of polynucleotides comprising the sample nucleotide sequence may be reduced. Reduction converts 5-formylcytosine residues in the sample nucleotide sequence into 5- hydroxymethylcytosine. The portions of polynucleotides may be reduced by treatment with a reducing agent. The reducing agent is any agent suitable for generating an alcohol from an aldehyde. The reducing agent or the conditions employed in the reduction step may be selected so that any 5-formylcytosine is selectively reduced (i.e. the reducing agent or reduction conditions are selective for 5-formylcytosine). Thus, substantially no other functionality in the polynucleotide is reduced in the reduction step. The reducing agent or conditions are selected to minimise or prevent any degradation of the polynucleotide.
Suitable reducing agents are well-known in the art and include NaBH4, NaCNBH4 and LiBH . Particularly useful reducing agents are those that may be used in aqueous conditions, as such are most convenient for the handling of the polynucleotide. However, reducing agents that are suitable for use in organic solvents may also be employed where practicable.
Following oxidation and reduction respectively, the reduced and oxidised portion of the population are treated with bisulfite. A second portion of the population which has not been oxidised or reduced is also treated with bisulfite. The bisulfite treatment can be done separately on the three samples, or the samples can be pooled so that the reduced, oxidised and untreated sample and all exposed to bisulfite in the same reaction.
Bisulfite treatment converts both cytosine and 5-formylcytosine residues in a polynucleotide into uracil. Where any 5-carboxycytosine is present (as a product of the oxidation step), this 5-carboxycytosine is converted into uracil in the bisulfite treatment. Without wishing to be bound by theory, it is believed that the reaction of the 5-formylcytosine proceeds via loss of the formyl group to yield cytosine, followed by a subsequent deamination to give uracil. The 5-carboxycytosine is believed to yield the uracil through a sequence of decarboxylation and deamination steps. Bisulfite treatment may be performed under conditions that convert both cytosine and 5-formylcytosine or 5-carboxycytosine residues in a polynucleotide as described herein into uracil.
A portion of the population may be treated with bisulfite by incubation with bisulfite ions (HSO32 ). The use of bisulfite ions (HSO32 ) to convert unmethylated cytosines in nucleic acids into uracil is standard in the art and suitable reagents and conditions are well known to the skilled person. Numerous suitable protocols and reagents are also commercially available (for example, EpiTect , Qiagen L; EZ DNA Methyl ation™ Zymo Research Corp CA; CpGenome Turbo Bisulfite Modification Kit; Millipore).
Also included are kits comprising multiple different sequences where each adaptor has two or more different cytosines. Included are kits comprising a first nucleic acid adaptor having a tag sequence of 5-20 bases including at least two different cytosine bases in the tag and a second nucleic acid adaptor having a different tag sequence of 5-20 bases including at least two different cytosine bases in the tag. The kit may have more than 2 adaptors. For example the kit may have adaptors with at least 4 different tag sequences, each having at least two different cytosine bases in the tag. The kit may have adaptors with at least 10 different tag sequences, each having at least two different cytosine bases in the tag. The kit may have adaptors with at least 24 different tag sequences, each having at least two different cytosine bases in the tag. The sequences of the tags may be selected from the sequences shown below:
CAGCAGCGTC
CTGCGACGAC
GCAGCGACGC
GCGACGCAGC
CGCAGCAGCG
CACTGCGACG
CAGCGTCAGC
CGACTGCGAC
TCGACGTCAC
CGACGACGCG
CGTCAGCAGC
GCTGCAGCAC
GCACGCAGCG
CAGCGCAGCT
CACGACGTCT
TCACTGCAGC
GCGACACGCG
CTGCAGCACT
TCACGACGTC
GCGCAGCGCA
ACGTCGCGAC
ACGCAGCTGC
TCTCAGCGAC
CGACACTGCT
In the sequences shown above, each C base in each tag is different; one is C, one is methyl C, one is hydroxymethyl C and one is formyl C. The kits may contain one of more nucleic acid acting enzymes such as nucleic acid polymerases or ligases. The adapter kits may contain a further nucleic acid complementary to a region of the first adapter such that the adaptor is at least partly double stranded.
The tag sequences may be selected using the protocol below, or modifications thereto:
1. Generate all possible kmers
• A kmer of length 1ont has 1048576 combinations.
2. Filter by base composition requirements
Minimum C Check
• Each tag must contain at least 4 cytosines to represent each modification (C, mC, hmC, fC). Base Balance Check
• Tags should not be over represented (>40%) by any particular base, or combination of bases. Colour Channel Check
• The AC proportion or GT proportion must not exceed 60%.
• AC and GT are in different colour channels (red and green respectively).
Homopolymer Check
• Each check is filtered to avoid runs of the same base. G & C must not be consecutive with itself.
• A and T must not have runs of more than two pairs.
After the initial checks there are 11863 remaining lomers.
3. Hamming Distance Check (measure of sequence similarity)
• No two tags should have a hamming distance (edit distance) of three or less.
• An all-against-all comparison of remaining lomers allows them to be ranked by hamming distance.
4. Selection of the top 24 ranked tags
• The top ranked lomers represent the most diverse by sequence set.
5. Apply modifications to Cytosines
• Avoid same order of modifications as this could introduce a bias.
6. A final hamming distance check on the 24 convertible tags
Hamming distance check for final set of modified tags (o=identical, io=no similarity)
Figure imgf000021_0001
Figure imgf000022_0001
Figure imgf000023_0001
The tags may be incorporated into adaptors used to attached to nucleic acid fragments for sequencing. Disclosed is a method of using the tagged adaptors for determining the methylation profile of a nucleic acid sample. The method may include the following steps; a) preparing a nucleic acid adaptor having a tag sequence having at least two cytosine bases, including one or more modified cytosine bases, wherein a first cytosine base has a nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5- hydroxymethylcytosine and a second cytosine base has a different nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine,
b) fragmenting a nucleic acid sample,
c) attaching the adaptor to the nucleic acid fragments to produce tagged fragments, d) treating a first portion of the tagged fragments with bisulfite,
e) mixing the bisulfite treated fragments with untreated fragments, and
f) sequencing the mixture of fragments, including the tags.
The method may include additional steps. The method may include the step of oxidising and/or reducing the sample prior to bisulfite treatment. The bisulfite treatment of the reduced or oxidised sample may take place separately, or with the sample mixed together. The method may include the steps of;
a) preparing a nucleic acid adaptor having at least 4 chemically different cytosine bases, b) fragmenting a nucleic acid sample,
c) attaching the adaptor to the nucleic acid fragments to produce tagged fragments, d) treating a first portion of the tagged fragments with an oxidising agent, e) treating a second portion of the tagged fragments with a reducing agent, f) treating the oxidised first portion, reduced second portion and a third portion with bisulfite
g) mixing the bisulfite treated portions with untreated fragments; and
h) sequencing the mixture of fragments, including the tags.
The method may include the step of oxidising the sample prior to bisulfite treatment. The method may include the steps of;
a) preparing a nucleic acid adaptor having at least three chemically different cytosine bases,
b) fragmenting a nucleic acid sample,
c) attaching the adaptor to the nucleic acid fragments to produce tagged fragments, d) treating a first portion of the tagged fragments with an oxidising agent,
e) treating the oxidised first portion and a second portion with bisulfite
f) mixing the bisulfite treated portions with untreated fragments; and
g) sequencing the mixture of fragments, including the tags.
The oxidising agent may be a non-enzymatic oxidising agent, for example, an organic or inorganic chemical compound. Suitable oxidising agents are well known in the art and include metal oxides, such as KRu04, Mn02 and KMn04. Particularly useful oxidising agents are those that may be used in aqueous conditions, which are most convenient for the handling of the polynucleotide. However, oxidising agents that are suitable for use in organic solvents may also be employed where practicable. In such cases the three different cytosine bases may be cytosine, 5-methylcytosine and 5-hydroxymethylcytosine.
In some embodiments, the oxidising agent may comprise a perruthenate anion (Ru04 ). Suitable perruthenate oxidising agents include organic and inorganic perruthenate salts, such as potassium perruthenate (KRu04) and other metal perruthenates; tetraalkyl ammonium perruthenates, such as tetrapropylammonium perruthenate (TPAP) and tetrabutylammonium perruthenate (TBAP); polymer-supported perruthenate (PSP) and tetraphenylphosphonium ruthenate. The oxidising agents may be a metal (VI) oxo complex. The oxidising agent may be manganate (Mn(VI)04 2"), ferrate (Fe(VI)04 2"), osmate (Os(VI)04 2"), ruthenate (Ru(VI)04 2" ), or molybdate (Mo(VI)04 2"). The method may include the step of reducing the sample prior to bisulfite treatment. The method may include the steps of;
a) preparing a nucleic acid adaptor having at least three chemically different cytosine bases,
b) fragmenting a nucleic acid sample,
c) attaching the adaptor to the nucleic acid fragments to produce tagged fragments, d) treating a first portion of the tagged fragments with a reducing agent,
e) treating the reduced first portion and a second portion with bisulfite
f) mixing the bisulfite treated portions with untreated fragments; and
g) sequencing the mixture of fragments, including the tags.
The reducing agent may be borohydride. Suitable reducing agents include NaBH4, NaCNBH4 and LiBH4. In such cases the three different cytosine bases may be cytosine, 5- methylcytosine and 5-formylcytosine.
A protocol for sequencing is shown below:
An NGS library was prepared using the standard Illumina TruSeq library construction kit but using a bespoke set of adapters in place of a standard set provided in the kit. This bespoke set of adapters had a 9mer tag in place of the standard 6mer tag used by Illumina. The sequence of the tag used was ATI ACG5CA (where 1 = 5hmC and 5 = 5mC). Libraries (ca. 2 ug) were prepared for sequencing as per the manufacturers recommendations. The library was subdivided into thirds (ca. 400 ng each) and one third (CEG04 41 4) was left native, one third (CEG04 41 5) was converted through the bisulfite only half of the CEGX TrueMethyl kit (as per manufacturers protocol) and a third (CEG04 41 6) was converted through the oxidative bisulfite half of the CEGX TrueMethyl kit. (as per manufacturers protocol). All samples were amplified using the PCR protocol in the TrueMethyl kit using the TrueMethyl polymerase. Amplicons were quantified and pooled at equimolar concentrations and used to prepare a 2nM library solution to take forward for Illumina SBS sequencing. The pooled libraries (containing CEG04 41 4-6) were loaded on a MiSeq at a concentration of 20 pM. Cluster generation and sequencing was performed automatically using MiSeq V2 chemistry kits, standard protocol. Libraries were sequenced in a 65+9 cycle MiSeq run. Reads were demultiplexed automatically according to the index tag sequence (ATCACGCCA = CEG04 41 4; ATCATGCTA = CEG04 41 5; and ATTATGCTA = CEG04 41 6). FASTQ files for each library were generated and Bismark (http :// www.bioinformatics. babraham . ac.uk/proj ects/bi smark/) was used to align Native, BS and OXBS samples to the Lambda genome. A summary of the sequencing results is shown in Table 2. Demultiplexed reads for all samples align well to the lambda genome as expected.
Conclusion: This experiment demonstrates the use of convertible index tags to uniquely differentiate between samples derived from a common library processed using different conversion chemistries. Reads processed through BS, oxBS or untreated can be unambiguously deconvolved from a complex pool of indexed fragments.
Figure imgf000026_0001
Table 2: Summary of the mapping efficiencies of the Native, BS and oxBS treated samples

Claims

Claims:
1. A nucleic acid adaptor comprising a tag sequence having at least three chemically different cytosine bases, including two or more modified cytosine bases, wherein a first cytosine base has a nucleotide selected from cytosine, 5-methylcytosine, 5- formylcytosine or 5-hydroxymethylcytosine, a second cytosine base has a different nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5- hydroxymethylcytosine and a third cytosine base has a different nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine.
2. The adapter according to claim 1 wherein the tag sequence has a first cytosine base which is unmodified cytosine, a second cytosine base which is 5-methylcytosine, and a third cytosine base which has a nucleotide selected from 5-formylcytosine or 5- hydroxymethylcytosine.
3. The adaptor according to claim 1 having at least 4 cytosine bases, including 4 chemically different cytosine bases.
4. The adaptor according to claim 3 having a cytosine base, a 5-methylcytosine base, a 5-formylcytosine base and a 5-hydroxymethylcytosine base.
5. The adaptor according to any one of claims 1 to 4 wherein the tag sequence is 6-12 bases in length.
6. The adaptor according to claim 5 wherein the tag sequence is 10 bases in length.
7. The adaptor according to any one of claims 1 to 6 wherein the cytosine bases are not adjacent in the tag sequence.
8. The adaptor according to any one of claims 1 to 7 wherein the cytosine bases within the tag are separated by at least one purine.
9. The adaptor according to any one of claims 6 to 8 wherein the tag sequences are selected from
CAGCAGCGTC
CTGCGACGAC
GCAGCGACGC
GCGACGCAGC
CGCAGCAGCG
CACTGCGACG
CAGCGTCAGC
CGACTGCGAC
TCGACGTCAC
CGACGACGCG
CGTCAGCAGC GCTGCAGCAC
GCACGCAGCG
CAGCGCAGCT
CACGACGTCT
TCACTGCAGC
GCGACACGCG
CTGCAGCACT
TCACGACGTC
GCGCAGCGCA
ACGTCGCGAC
ACGCAGCTGC
TCTCAGCGAC
CGACACTGCT
10. A nucleic acid sample labelled with a nucleic acid adaptor according to any preceding claim.
11. A kit comprising a first nucleic acid adaptor having a tag sequence of 5-20 bases including at least three different cytosine bases in the tag and a second nucleic acid adaptor having a different tag sequence of 5-20 bases including at least two different cytosine bases in the tag.
12. The kit of claim 11 comprising at least 4 different tag sequences, each having at least three different cytosine bases in the tag.
13. The kit of claim 12 comprising at least 10 different tag sequences, each having at least three different cytosine bases in the tag.
14. The kit of claim 13 comprising at least 24 different tag sequences, each having at least three different cytosine bases in the tag.
15. The kit of any one of claims 11 to 14 wherein the tag sequences are selected from
CAGCAGCGTC
CTGCGACGAC
GCAGCGACGC
GCGACGCAGC
CGCAGCAGCG
CACTGCGACG
CAGCGTCAGC
CGACTGCGAC
TCGACGTCAC
CGACGACGCG
CGTCAGCAGC
GCTGCAGCAC
GCACGCAGCG
CAGCGCAGCT
CACGACGTCT
TCACTGCAGC GCGACACGCG
CTGCAGCACT
TCACGACGTC
GCGCAGCGCA
ACGTCGCGAC
ACGCAGCTGC
TCTCAGCGAC
CGACACTGCT
16. A method for determining the methylation of a nucleic acid sample comprising
a) preparing a nucleic acid adaptor having a tag sequence having at least two
cytosine bases, including one or more modified cytosine bases, wherein a first cytosine base has a nucleotide selected from cytosine, 5-methylcytosine, 5- formylcytosine or 5-hydroxymethylcytosine and a second cytosine base has a different nucleotide selected from cytosine, 5-methylcytosine, 5-formylcytosine or 5-hydroxymethylcytosine,
b) fragmenting a nucleic acid sample,
c) attaching the adaptor to the nucleic acid fragments to produce tagged fragments, d) treating a first portion of the tagged fragments with bisulfite,
e) mixing the bisulfite treated fragments with untreated fragments, and
f) sequencing the mixture of fragments, including the tags.
17. A method of claim 16 comprising
a) preparing a nucleic acid adaptor having at least 4 chemically different cytosine bases,
b) fragmenting a nucleic acid sample,
c) attaching the adaptor to the nucleic acid fragments to produce tagged fragments, d) treating a first portion of the tagged fragments with an oxidising agent, e) treating a second portion of the tagged fragments with a reducing agent, f) treating the oxidised first portion, reduced second portion and a third portion with bisulfite,
g) mixing the bisulfite treated portions with untreated fragments; and
h) sequencing the mixture of fragments, including the tags.
PCT/GB2015/052183 2014-07-28 2015-07-28 Improved nucleic acid sample analysis using convertible tags WO2016016639A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP15744329.2A EP3174996A1 (en) 2014-07-28 2015-07-28 Improved nucleic acid sample analysis using convertible tags

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1413318.5 2014-07-28
GBGB1413318.5A GB201413318D0 (en) 2014-07-28 2014-07-28 Nucleic acid sample preparation

Publications (1)

Publication Number Publication Date
WO2016016639A1 true WO2016016639A1 (en) 2016-02-04

Family

ID=51587329

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2015/052183 WO2016016639A1 (en) 2014-07-28 2015-07-28 Improved nucleic acid sample analysis using convertible tags

Country Status (3)

Country Link
EP (1) EP3174996A1 (en)
GB (1) GB201413318D0 (en)
WO (1) WO2016016639A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9822394B2 (en) 2014-02-24 2017-11-21 Cambridge Epigenetix Limited Nucleic acid sample preparation
US10323269B2 (en) 2008-09-26 2019-06-18 The Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
US10428381B2 (en) 2011-07-29 2019-10-01 Cambridge Epigenetix Limited Methods for detection of nucleotide modification
US10563248B2 (en) 2012-11-30 2020-02-18 Cambridge Epigenetix Limited Oxidizing agent for modified nucleotides
US11410750B2 (en) 2018-09-27 2022-08-09 Grail, Llc Methylation markers and targeted methylation probe panel
US11566284B2 (en) 2016-08-10 2023-01-31 Grail, Llc Methods of preparing dual-indexed DNA libraries for bisulfite conversion sequencing
WO2023081722A3 (en) * 2021-11-02 2023-07-20 Guardant Health, Inc. Quality control method
US12018320B2 (en) 2022-02-18 2024-06-25 The Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090047680A1 (en) * 2007-08-15 2009-02-19 Si Lok Methods and compositions for high-throughput bisulphite dna-sequencing and utilities
WO2013017853A2 (en) * 2011-07-29 2013-02-07 Cambridge Epigenetix Limited Methods for detection of nucleotide modification
WO2013090588A1 (en) * 2011-12-13 2013-06-20 Oslo Universitetssykehus Hf Methods and kits for detection of methylation status
US20130244237A1 (en) * 2012-03-15 2013-09-19 New England Biolabs, Inc. Methods and Compositions for Discrimination Between Cytosine and Modifications Thereof and for Methylome Analysis
EP2698437A1 (en) * 2011-04-15 2014-02-19 Riken Method and kit for detecting 5-hydroxymethylcytosine in nucleic acids

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090047680A1 (en) * 2007-08-15 2009-02-19 Si Lok Methods and compositions for high-throughput bisulphite dna-sequencing and utilities
EP2698437A1 (en) * 2011-04-15 2014-02-19 Riken Method and kit for detecting 5-hydroxymethylcytosine in nucleic acids
WO2013017853A2 (en) * 2011-07-29 2013-02-07 Cambridge Epigenetix Limited Methods for detection of nucleotide modification
WO2013090588A1 (en) * 2011-12-13 2013-06-20 Oslo Universitetssykehus Hf Methods and kits for detection of methylation status
US20130244237A1 (en) * 2012-03-15 2013-09-19 New England Biolabs, Inc. Methods and Compositions for Discrimination Between Cytosine and Modifications Thereof and for Methylome Analysis

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BARBARA STEIGENBERGER ET AL: "Synthesis of 5-Hydroxymethyl-, 5-Formyl-, and 5-Carboxycytidine-triphosphates and Their Incorporation into Oligonucleotides by Polymerase Chain Reaction", ORGANIC LETTERS, vol. 15, no. 2, 18 January 2013 (2013-01-18), pages 366 - 369, XP055215297, ISSN: 1523-7060, DOI: 10.1021/ol3033219 *
FLUSBERG BENJAMIN A ET AL: "Direct detection of DNA methylation during single-molecule, real-time sequencing", NATURE METHODS, NATURE PUBLISHING GROUP, GB, vol. 7, no. 6, 1 July 2010 (2010-07-01), pages 461 - 465, XP009142171, ISSN: 1548-7105, [retrieved on 20100509], DOI: 10.1038/NMETH.1459 *
M. J. BOOTH ET AL: "Quantitative Sequencing of 5-Methylcytosine and 5-Hydroxymethylcytosine at Single-Base Resolution", SCIENCE, vol. 336, no. 6083, 18 May 2012 (2012-05-18), pages 934 - 937, XP055064913, ISSN: 0036-8075, DOI: 10.1126/science.1220671 *
MARTIN MÜNZEL ET AL: "Improved Synthesis and Mutagenicity of Oligonucleotides Containing 5-Hydroxymethylcytosine, 5-Formylcytosine and 5-Carboxylcytosine", CHEMISTRY - A EUROPEAN JOURNAL, vol. 17, no. 49, 2 December 2011 (2011-12-02), pages 13782 - 13788, XP055215792, ISSN: 0947-6539, DOI: 10.1002/chem.201102782 *
SEUNG-GI JIN ET AL: "Examination of the specificity of DNA methylation profiling techniques towards 5-methylcytosine and 5-hydroxymethylcytosine", NUCLEIC ACIDS RESEARCH, INFORMATION RETRIEVAL LTD, vol. 38, no. 11, 1 June 2010 (2010-06-01), pages e125 - 1, XP002631408, ISSN: 0305-1048, [retrieved on 20100405], DOI: 10.1093/NAR/GKQ223 *
YUN HUANG ET AL: "The Behaviour of 5-Hydroxymethylcytosine in Bisulfite Sequencing", PLOS ONE, vol. 5, no. 1, 1 January 2010 (2010-01-01), pages e8888 - e8888, XP055097626, ISSN: 1932-6203, DOI: 10.1371/journal.pone.0008888 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10731204B2 (en) 2008-09-26 2020-08-04 Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
US10774373B2 (en) 2008-09-26 2020-09-15 Children's Medical Center Corporation Compositions comprising glucosylated hydroxymethylated bases
US10323269B2 (en) 2008-09-26 2019-06-18 The Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
US11208683B2 (en) 2008-09-26 2021-12-28 The Children's Medical Center Corporation Methods of epigenetic analysis
US10443091B2 (en) 2008-09-26 2019-10-15 Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
US10465234B2 (en) 2008-09-26 2019-11-05 Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
US10508301B2 (en) 2008-09-26 2019-12-17 Children's Medical Center Corporation Detection of 5-hydroxymethylcytosine by glycosylation
US10533213B2 (en) 2008-09-26 2020-01-14 Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
US11072818B2 (en) 2008-09-26 2021-07-27 The Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
US10793899B2 (en) 2008-09-26 2020-10-06 Children's Medical Center Corporation Methods for identifying hydroxylated bases
US10337053B2 (en) 2008-09-26 2019-07-02 Children's Medical Center Corporation Labeling hydroxymethylated residues
US10767216B2 (en) 2008-09-26 2020-09-08 The Children's Medical Center Corporation Methods for distinguishing 5-hydroxymethylcytosine from 5-methylcytosine
US10612076B2 (en) 2008-09-26 2020-04-07 The Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins
US10428381B2 (en) 2011-07-29 2019-10-01 Cambridge Epigenetix Limited Methods for detection of nucleotide modification
US10563248B2 (en) 2012-11-30 2020-02-18 Cambridge Epigenetix Limited Oxidizing agent for modified nucleotides
US9822394B2 (en) 2014-02-24 2017-11-21 Cambridge Epigenetix Limited Nucleic acid sample preparation
US11566284B2 (en) 2016-08-10 2023-01-31 Grail, Llc Methods of preparing dual-indexed DNA libraries for bisulfite conversion sequencing
US11410750B2 (en) 2018-09-27 2022-08-09 Grail, Llc Methylation markers and targeted methylation probe panel
US11685958B2 (en) 2018-09-27 2023-06-27 Grail, Llc Methylation markers and targeted methylation probe panel
US11725251B2 (en) 2018-09-27 2023-08-15 Grail, Llc Methylation markers and targeted methylation probe panel
US11795513B2 (en) 2018-09-27 2023-10-24 Grail, Llc Methylation markers and targeted methylation probe panel
WO2023081722A3 (en) * 2021-11-02 2023-07-20 Guardant Health, Inc. Quality control method
US12018320B2 (en) 2022-02-18 2024-06-25 The Children's Medical Center Corporation Selective oxidation of 5-methylcytosine by TET-family proteins

Also Published As

Publication number Publication date
GB201413318D0 (en) 2014-09-10
EP3174996A1 (en) 2017-06-07

Similar Documents

Publication Publication Date Title
WO2016016639A1 (en) Improved nucleic acid sample analysis using convertible tags
JP2024060054A (en) Method for identifying and enumerating nucleic acid sequence, expression, copy, or DNA methylation changes using a combination of nucleases, ligases, polymerases, and sequencing reactions
EP3555305B1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
CN111201329A (en) High throughput single cell sequencing with reduced amplification bias
EP3574112B1 (en) Barcoded dna for long range sequencing
US20220389416A1 (en) COMPOSITIONS AND METHODS FOR CONSTRUCTING STRAND SPECIFIC cDNA LIBRARIES
AU2015315103A1 (en) Methods and compositions for rapid nucleic acid library preparation
US11319576B2 (en) Methods of producing nucleic acid libraries and compositions and kits for practicing same
US20230056763A1 (en) Methods of targeted sequencing
WO2016063034A1 (en) Improved nucleic acid sample preparation using concatenation
CN110139931B (en) Methods and compositions for phased sequencing
US20170283870A1 (en) Methods for detection of nucleotide modification
Stern Tagmentation-based mapping (TagMap) of mobile DNA genomic insertion sites
US20170175182A1 (en) Transposase-mediated barcoding of fragmented dna
US10023908B2 (en) Nucleic acid amplification method using allele-specific reactive primer
WO2016170319A1 (en) Nucleic acid sample enrichment
EP3237635B1 (en) Method for generating sequence ready fragments by the use of "bubble primers"
US20220325317A1 (en) Methods for generating a population of polynucleotide molecules
EP3650558A1 (en) Liquid sample workflow for nanopore sequencing
Mauger et al. Ribo‐polymerase chain reaction—A facile method for the preparation of chimeric RNA/DNA applied to DNA sequencing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15744329

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2015744329

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE