WO2016063034A1

WO2016063034A1 - Improved nucleic acid sample preparation using concatenation

Info

Publication number: WO2016063034A1
Application number: PCT/GB2015/053122
Authority: WO
Inventors: Tobias William Barr Ost; Helen Rachel Bignell
Original assignee: Cambridge Epigenetix Ltd
Priority date: 2014-10-20
Filing date: 2015-10-20
Publication date: 2016-04-28
Also published as: GB201418621D0

Abstract

The inventors have developed a method to multiplex nucleic acid primer extension reactions. The method involves concatenating primer extension products and analysing the concatenated products using sequencing.

Description

Improved nucleic acid sample preparation using concatenation

This invention relates to the preparation of nucleic acid samples for analysis. One method of obtaining information relating to the nucleic acids in a sample is to sequence the whole sample, for example using 'whole genome' sequencing. Such approaches generate a large amount of sequencing information. Alternative approaches can use single primers which hybridise specifically to a single location in the nucleic acid sample. The primers can be extended, for example using a nucleic acid polymerase, and thus the identity of a single base can be seen. Thus a single location in the sample can be analysed.

Numerous methods of multiplexing primer extension reactions have been reported. One method is to use an array of oligonucleotide primers and hybridise the sample to the array, where primer extension can be carried out. An alternative method performs the primer extension in solution and hybridises the extended products to an array. In both scenarios, the content of the array is pre-defined by the provider of the array who decides which oligonucleotides are on the array.

Primer extension products are typically short sequences of nucleotides, for example 25-70 bases in length. In order to identify the primer and the extended nucleotide, the sequence of the primers must be determined, either by hybridisation to an array of known complementary sequences, or via sequencing. The sequencing of short fragments on high throughput platforms is not cost effective, and hence new methods of preparing nucleic sample are required in order to use sequencing as a read out of primer extension reactions. Such methods are described herein.

Summary of the invention

The invention relates to methods of using sequencing as a readout for multiplexed primer extension reactions. Thus the locations of the sample to be analysed can be selected on a sample by sample basis, and may not be reliant on a pre-prepared array of oligonucleotides. Disclosed herein is a method of identifying the nucleotides present at a plurality of locations in a nucleic acid sample, the method comprising;

a. extending a population of loci specific primers where each of the locations in the sample is in the vicinity of a unique hybridised primer; b. concatenating the extended primers to produce a population of concatenated products; and

c. sequencing the concatenated products.

The concatenated extended primers contain repeated regions of locus primer and regions of extension primer linked together. Thus the 'known' primers and the 'unknown' extension regions are interspersed in the concatenated products. The sequencing readout of each concatenated product takes the form of a locus followed by a short sample read of unknown sequence in repeating sections. The sequence of the locus identifies the location in the sample of the short sample read. The concatenation allows multiple locus-extension products to be sequenced per sequencing read.

The method can be carried out using a large number of different primers. For example, the method can be carried out such that at least 100,000 locations are analysed per sample.

Concatenation causes the primer extension products to link together in a chain. The linking causes longer strands of oligonucleotides to be produced. In sequencing applications where hundreds of bases can be read per sequencing read, the linking of multiple primer extension products into a single contiguous chain means that multiple primers can be analysed per sequencing read. The method is particularly advantageous in platforms where long reads of say greater than 1000 bases per fragment can be obtained. The method may be particularly advantageous for the Pacific Biosciences™ platform.

The length of the extension products should be minimised in order to optimise the amount of information obtained from the sequencing process. In order to minimise the length of the extension products, the extension may be carried out with less than four dNTP's. The extension may be carried out with two or three dNTP' s. In order to terminate the extension, the reactions may be carried out with one or more rNTP's. The extension may be carried out with one rNTP and three dNTP' s. In such cases the extension may be inhibited as a DNA polymerase is unable to incorporate a ribo-NTP, and thus extension is prevented in a similar way to the absence of the corresponding dNTP.

In order to make the concatenation reaction efficient, the extended primers may be digested to make blunt ended fragments. The digestion can be carried out with one or more nucleases. The primers are loci specific, meaning that they hybridise to a single location in the nucleic acid sample of interest. The primers can be of different lengths in order to normalise melting temperatures. The primers can be between 15-40 bases in length.

Extension adds one or more nucleotides to each primer. The extension can add 10-30 bases per primer. The amount of extension is dependent on the template sequence being copied, and the absence or presence of the relevant dNTP. The extension will cease either due to the absence of the correct dNTP, or by incorporation of a terminator nucleotide which can not be further extended. The terminator nucleotide may contain a blocked 3 '- terminus. The length of the extension products may be between 25-70 nucleotides.

The extended primers can be concatenated. The extension could be chemical or enzymatic. In order to use enzymic methods, the extension products must contain a 3'-OH to allow for concatenation. The 3 '-OH comes from the added dNTP' s, or is released by removal of a blocking moiety. The concatenation can be carried out using one or more ligases. The concatenation makes longer strings of nucleic acid fragments where two or more of the primer extension products are connected together. The concatenated products may be any desired length. The concatenated products may be between 400-600 base pairs in length on average.

In order to sequence the concatenated strings of extended primers, universal adaptors may be attached to the ends. The universal adaptors allow amplification using a single pair of primers complementary to the adaptor.

The sequencing may be carried out using a commercially available high throughput sequencing platform. The sequencing may be carried out on a solid support.

The sample may be treated or processed prior to the primer extension. The sample may be bisulfite treated in order to analyse the methylation status of the cytosine bases in the sample. In such cases the sample may be compared with a sample which has not undergone bisulfite treatment. Disclosed are kits for carrying out the method. The kits may include one rNTP and three dNTP' s, a nucleic acid polymerase, one or more nucleases and one or more ligases. Alternatively the kits may contain one terminator dNTP having a 3 'block and three dNTP' s.

Description of Figures

Figure 1 is a schematic of the method of the invention. The single stranded nucleic acid sample is hybridised with a primer which is extended. The double stranded extension products are made blunt ended. The 25-70 base pair double stranded fragments are ligated together to form concatamers. The resultant double stranded concatameric products averaging 400-600 base pairs in length are processed for sequencing using standard library construction methods. Thus roughly 10-20 primer products can be analysed per sequencing read.

Figure 2 is a schematic of the method of the invention. The single stranded nucleic acid sample is hybridised with a primer which is extended. The double stranded extension products are made blunt ended. The 25-70 base pair double stranded fragments are ligated together to form concatamers. The concatameric products can be any desired length, for example 400-600 base pairs. The resultant double stranded concatameric products are processed for sequencing using using hairpin adapters to produce circular constructs for sequencing. Multiple primer products can be analysed per sequencing read.

Figure 3 shows a gel which demonstrates the non-incorporation of UTP. UTP has no effect on PCR at low concentrations, and inhibits incorporation of any dNTP's when high concentrations are reached

Figure 4 shows a gel which demonstrates replacement of dTTP with UTP results in strand termination. No extension was observed with no dNTP (Lane 1). Long extension products were observed with dNTP (Lane 2). Terminated extension products were observed with the UTP mix (Lane 33). The position of the T in the sequence correlates with the sizes of the products on the gel. UTP is terminating the extension at the first T. Thus the desired products can be obtained as shown in lane 3. Figure 5 shows a gel which shows that SI nuclease removes both ssDNA primers that are not annealed. For the annealed primers, S I nuclease removes 5' and 3 ' overhangs to leave 31bp blunt dsDNA.

Figure 6 shows a gel which indicates concatenation of SI nuclease annealed adapters by ligation is successful. The duplex monomer is 31bp and a ladder of products observed following exposure to the ligase. PNK or end repair treatment is not improving ligation suggesting SI nuclease does produce 5' phosphorylated blunt dsDNA.

Detailed Description

The invention relates to methods of using sequencing as a readout for multiplexed primer extension reactions. Disclosed herein is a method of identifying the nucleotides present at a plurality of locations in a nucleic acid sample, the method comprising;

a. extending a population of loci specific primers where each of the locations in the sample is in the vicinity of a unique hybridised primer;

b. concatenating the extended primers to produce a population of concatenated products; and

c. sequencing the concatenated products.

Disclosed herein is a method of identifying the nucleotides present at a plurality of locations in a nucleic acid sample, the method comprising;

b. digesting the extended primers to make blunt ended fragments;

c. concatenating the blunt ended fragments to produce a population of concatenated products;

d. attaching universal adapters to one or both ends of the concatenated products; and

e. sequencing the concatenated products to produce a sequencing read of each concatenated product, each sequencing read having multiple extended primers per read. The term identifying the nucleotide refers to establishing whether the nucleotide is A, G, T, C or a modification thereof such as U or methyl C. The nucleotide may be the same as the nucleotide derived from a sample of interest, or may have been modified or altered prior to analysis. The term identifying the nucleotide therefore includes establishing whether a cytosine is methylated.

The location of the base(s) being analysed is determined by the identity of a specific primer. More than one base can be analysed per extended primer. The extension products an be used to analyse nucleotide changes, for example single nucleotide polymorphisms (SNP's) or methylation status, for example whether C cases have been converted to U upon bisulfite treatment. Furthermore the multiple base extensions can give information relation to deletions or insertions of one or more bases. The term identifying the nucleotides thus refers to both the analysis of one or more bases in each extended primer, and relates to a plurality of different primers (locations).

The term loci specific means that the primer hybridises selectively to a single location, or loci, in the nucleic acid sample. The method can be carried out using a large number of different primers which can be pooled prior to hybridisation with the sample. For example, the method can be carried out such that at least 100,000 locations (primers) are analysed per sample. The method can be carried out such that at least 200,000 locations (primers) are analysed per sample. The method can be carried out such that at least 300,000 locations (primers) are analysed per sample. The method can be carried out such that at least 400,000 locations (primers) are analysed per sample.

The location(s) in the sample to be identified is/are in the vicinity of a unique primer. The base(s) to be interrogated should be at the 3 '- side of the primer such that nucleotides can be incorporated complementary to the base(s) being analysed. The base(s) to be interrogated may be immediately 3 ' of the primer such that the first incorporation is being studied, or may be within 2-30 bases of the end of the primer. The interrogated bases can be in different locations for different primers.

The primers can be of different lengths in order to normalise melting temperatures. The primers can be between 15-40 bases in length. Primers having higher levels of A and T bases can be longer than primers having higher levels of C and G bases. The primers should be specifically hybridised at the temperature required for the polymerase extension.

The primers can be extended using a suitable nucleic acid polymerase. The polymerase may be a DNA polymerase. The polymerase may be active at room temperature, or may be a thermophilic polymerase. The temperature of the extension reaction can be chosen based on the desired specificity of the primer hybridisation reactions and the length of the primer sequences. The temperature of the extension reaction can be for example between 30-72 °C. The temperature of the extension reaction can be for example between 50-72 °C.

The length of the extension products should be minimised in order to optimise the amount of information obtained from the sequencing process. In order to minimise the length of the extension products, the extension may be carried out with less than four dNTP's. The extension may be carried out with two or three dNTP' s. In order to terminate the extension, the reactions may be carried out with one or more rNTP's. The extension may be carried out with one rNTP and three dNTP's. The extension may be carried out with one rNTP and one dNTP.

The reaction may be carried out with dCTP and rTTP. The reaction may be carried out with dCTP, dGTP, dATP and rTTP. The reaction may be carried out with dCTP, dGTP, dATP and UTP (UTP is a rNTP). The non-incorporation of the ribo sugar terminates the extension. Alternatively the extension can be terminated using a blocked NTP. The blocked NTP can have a chemical block at the 3 ' position. The blocking moiety can be a small chemical moiety, for example an allyl, methoxymethyl or azidomethyl group. The blocking NTP can be present with one or more unblocked dNTP. Alternatively two or more blocked dNTP's can be used. Four dNTP's can be present, or which one, two, three or four can be blocked NTP' s. Once the extension reaction has been carried out, the extended primers can be unblocked by removing the block at the 3 -position to release a 3' -OH. In the case or the azidomethyl group, the release can be carried out using a phosphine reagent. In the case of the allyl group, the block can be removed using palladium and a phosphine.

The nucleic acid samples are prepared as single stranded, which are then hybridised with the primers. The sample can be fragmented prior to primer hybridisation. The hybridisation can be carried out by heating a population of double stranded fragments, thus melting them to be single strands, and allowing the mixture to cool. Alternatively the sample can be prepared as a single stranded sample without heat denaturing. In cases where the sample is exposed to bisulfite, the nucleic acid fragments in the sample will be single stranded. The fragments can be made single stranded by other chemical treatments, for example exposure to hydroxide.

Extension adds one or more nucleotides to each primer. The extension can add 10-30 bases per primer. The amount of extension is dependent on the template sequence being copied, and the absence or presence of the relevant dNTP. The extension will cease either due to the absence of the correct nucleotide, or by incorporation of a terminator nucleotide which can not be further extended. The terminator nucleotide may be rNTP, or may contain a blocked 3 '- terminus. The length of the extension products, including the original primer may be between 25-70 nucleotides. After blunt end digestion, these products will be entirely double stranded, with no single stranded overhangs.

After primer extension, the sample is at least in part double stranded. A portion of the sample will also be single stranded, as the nucleic acid extension will not continue to the end of every fragment. In order to make the concatenation reaction efficient, the sample may be digested to make blunt ended double stranded fragments where all the single stranded nucleic acids are removed. The digestion can be carried out with one or more nucleases. The treatment with nucleases leaves a population of double stranded fragments containing the hybridised primer and the extension of a short length of nucleotides complementary to the sample, including the location of the base(s) whose identity is being analysed. The nuclease can be S I nuclease or Exol nuclease. The mixture can contain two nucleases in combination. The mixture can include both SI nuclease and Exol nuclease.

The extended primers can be concatenated. If enzymes are used for concatenation, the extension products must contain a 3 '-OH to allow for concatenation. The 3 '-OH comes from the added dNTP's, or is released by removal of a blocking moiety. The blunt ended duplexes can be treated with a kinase to add a 5'-phosphate group. The kinase can be polynucleotide kinase (PNK). The concatenation can be carried out using one or more ligases. The ligase may be Quick ligase. The concatenation makes longer strings of nucleic acid fragments where two or more of the primer extension products are connected together. The concatenated products may be between 300-1000 base pairs in length. The concatenated products may be between 400-600 base pairs in length. Alternatively the single stranded extension products can be ligated together to form single stranded concatamers. Single stranded ligation can be carried out efficiently if the primer contains a group which is capable of chemically reacting with a moiety on the incorporated nucleotides. Alternatively the single stranded ligation can be mediated enzymatically. One example of enzymatic ligation involves the use of a triphosphate moiety on the primer which reacts with, in the presence of a template independent polymerase, the 3 '- hydroxyl of the incorporated nucleotide at the end of the extended primers.

In order to sequence the concatenated strings of extended primers, universal adaptors may be attached to the ends. The universal adaptors allow amplification using a single pair of primers complementary to the adaptor. Many methods exist for the preparation of samples of double- stranded DNA, for example for sequencing (e.g. Illumina TruSeq and NextEra, 454, NEBnext, Life Technologies etc). The adaptors may be hairpin constructs such that the adaptors can form closed circular fragments. The adaptor can attach to both the 3 ' end of one strand and the 5' end of the opposite strand, thus joining the strands. Attachment of such adapters at both ends of the duplex forms duplex strands closed at both ends, as shown in Figure 2.

If the concatameric products are single stranded, universal adaptors can be attached using single stranded ligation techniques, for example using a template independent polymerase as used to prepare the single stranded concatamers.

The sample being analysed may be treated or processed prior to the primer extension. The sample may be bisulfite treated in order to analyse the methylation status of the cytosine bases in the sample. In such cases the sample may be compared with a sample which has not undergone bisulfite treatment.

Bisulfite sequencing allows 5-methylcytosine to be distinguished from the unmethylated cytosine. In addition to 5-methyl cytosine, other cytosine modifications including 5- hydroxymethyl and 5-formyl have been identified. In order to differentiate between these different cytosine modifications, techniques involving oxidation and/or reduction of the samples prior to bisulfite sequencing have been developed. In order to extract value from bisulfite sequencing, the sequencing output must be compared with a sample which has not undergone bisulfite treatment.

Both 5-formylcytosine (5fC) and cytosine (C) are converted to uracil upon bisulfite treatment. Reduction of the formyl group to hydroxymethyl C (hmC) prior to bisulfite treatment allows C and 5fc to be identified. 5-Methylcytosine (5mC) and 5- hydroxymethylcytosine (5hmC) are not affected by bisulfite. Oxidation of the hydroxymethyl group to a formyl group allows the two to be differentiated. A summary of the relevant transformations is shown below:

The structures of the bases is shown below:

cytosine (C) 5-methylcytosine (5MC)

5-hydroxymethylcytosine (5HMC) 5-formylcytosine (5FC)

In cases where the samples are bisulfite treated prior to hybridisation, suitably designed primers can be used to reflect the altered composition of the sample. Where C bases have converted to T's, primers with A instead of G can be used. In such cases the same base in the bisulfite and non bisulfite treated samples is interrogated using different primers.

The benefits of the method include reducing the cost of analysis of multiplexed primer extension processes. Additionally the ability to identify the sample is not dependent on using commercial microarrays. Commercial arrays are limited by their defined content, and thus the only primers that can be used are the ones on the array. In order to analyse a different locus, or a different organism, a new array is required. The ability to use sequencing as the readout means that any primer can be used. Thus any organism can be studied, and any location in said organism can be determined. The pool of primers can be prepared for each individual experiment as necessary. Another advantage over conventional pull-down methods is that the resultant reads from a primer extended construct will all be "on target", i.e. at the point where the primers were designed to target the template, unlike other hybridisation pulldowns where a bait pulls down a specific region of library prepped DNA, the enriched pool of which is then sequenced. Typical pull down methods use long 'bait' oligonucleotides to pull down even longer fragments of the target sample. The bait can hybridise anywhere within the insert region and there is no guarantee that the region targeted by the bait will be sequenced. Thus a large amount of off-target sequence is generated, which decreases the cost efficiency of the technique. The methods described herein therefore improve the localisation of microarray pull down techniques by more accurately selecting the sample regions of interest.

Disclosed are kits for carrying out the method. The kits may include components selected from one or more nucleic acid primers, one or more rNTP's, one or more dNTP's, one or more blocked NTP' s, a nucleic acid polymerase, one or more nucleases and one or more ligases. The kits may include one rNTP and three dNTP' s, a nucleic acid polymerase, one or more nucleases and one or more ligases. The kit may contain a large number or primers, for example greater than 100,000.

Disclosed is a method for determining the methylation profile of a nucleic acid sample. The method may include the following steps;

a) bisulfite treating a nucleic acid sample;

b) hybridising the bisulfite treated sample with a population of loci specific primers; c) extending the loci specific primers using a polymerase and one or more dNTP's; d) making the extended primers blunt ended duplexes;

e) concatenating the blunt ended duplexes to produce a population of concatenated products; and

f) sequencing the concatenated products.

a) bisulfite treating a nucleic acid sample;

b) hybridising the bisulfite treated sample with a population of loci specific primers; c) extending the loci specific primers using a polymerase and one or more dNTP' s; d) making the extended primers blunt ended duplexes;

e) concatenating the blunt ended duplexes to produce a population of concatenated products;

f) attaching universal adapters to one or both ends of the concatenated products; and g) sequencing the concatenated products to produce a sequencing read of each concatenated product, each sequencing read having multiple extended primers per read.

The method may include additional steps. The method may include the step(s) of oxidising and/or reducing the sample prior to bisulfite treatment.

The population of nucleic acid molecules may be a sample of DNA or RNA, for example a genomic DNA sample. Suitable DNA and RNA samples may be obtained or isolated from a sample of cells, for example, mammalian cells such as human cells or tissue samples, such as biopsies. In some embodiments, the sample may be obtained from a formalin fixed parafin embedded (FFPE) tissue sample. Suitable cells include somatic and germ-line cells.

The population may be a diverse population of nucleic acid molecules, for example a library, such as a whole genome library or a loci specific library.

Nucleic acid strands in the population may be amplified nucleic acid molecules, for example, amplified fragments of the same genetic locus or region from different samples.

Nucleic acid strands in the population may be enriched. For example, the population may be an enriched subset of a sample produced by pull-down onto a hybridisation array or digestion with a restriction enzyme.

The samples may be further processed, for example by amplification. The concatenated oligonucleotides may be copied using a nucleic acid polymerase. If adaptors are attached to both ends of the target fragments, the population of fragments can be amplified using a single pair of primers complementary to the adaptors.

In order to further multiplex the readout, the primers from different sources can be separately tagged. The tags can thereby be used to help identify sequences from different sources. If primers are used with different sequences for different sources of biological materials, then the different samples/sources can be pooled for concatenation, but still identified via the tag when the tags and primers are sequenced. Thus the disclosure herein includes the use of two or more different populations of identifiable primers for the multiplexing of the analysis of different samples. Disclosed herein therefore are kits containing two or more primers of different tag sequence as well as the loci sequence. Thus the kit may contain primers with the same loci specific sequence, but different tags to code for different samples.

The concatenated samples can be attached to further adaptors. The sequence of the adaptor oligonucleotide depends on the specific application and suitable adaptor oligonucleotides may be designed using known techniques. A suitable adaptor oligonucleotide may, for example, consist of 20 to 100 nucleotides. The sequence of the adaptor may be selected to be complementary to a suitable amplification/extension primer. The method may be used in order to prepare samples for nucleic acid sequencing. The method may be used to sequence a population of synthetic oligonucleotides, for example for the purposes of quality control. Alternatively, the sample may come from a population of nucleic acid molecules from a biological sample. The population may be fragments of between 100-10000 nucleotides in length. The fragments may be 200-1000 nucleotides in length. The fragments may be of random variable sequence. The order of bases in the sequence may be known, unknown, or partly known. The fragments may come from treating a biological sample to obtain fragments of shorter length than exist in the naturally occurring sample. The fragments may come from a random cleavage of longer strands. The fragments may be derived from shearing the sample using a physical method such as hydrodynamic shearing. The fragments may be derived from treating a nucleic acid sample with a chemical reagent (for example sodium bisulfite, acid or alkali) or enzyme (for example with a restriction endonuclease or other nuclease). The fragments may come from a treatment step that causes double stranded molecules to become single stranded.

Methods of the invention may be useful in preparing a population of nucleic acid strands for sequencing, for example a population of bisulfite-treated single- stranded nucleic acid fragments. Bisulfite treatment produces single- stranded nucleic acid fragments, typically of about 250-1000 nucleotides in length. The population may be treated with bisulfite by incubation with bisulfite ions (HS0₃ ² ). The use of bisulfite ions (HS0₃ ^2") to convert unmethylated cytosines in nucleic acids into uracil is standard in the art and suitable reagents and conditions are well known. Numerous suitable protocols and reagents are also commercially available (for example, EpiTect™, Qiagen NL; EZ DNA Methyl ation™ Zymo Research Corp CA; CpGenome Turbo Bisulfite Modification Kit, Millipore; TrueMethyl™. Cambridge Epigenetix, UK.

The methods disclosed may further include the step of producing one or more copies of the concatenated products. The methods may include producing multiple copies of each of the different concatenated sequences. The copies may be made by hybridising a primer sequence opposite a universal sequence on the adaptor, and using a nucleic acid polymerase to synthesise a complementary copy. The production of the complementary copy provides a double stranded polynucleotide. The double stranded polynucleotides can be amplified using primers complementary to both strands. The amplification can be locus-specific. Locus specific amplification only amplifies a selection of the fragments in the pool and is therefore a selective amplification for certain sequences. Alternatively adaptor sequences can be attached to both ends of the fragments. The attachment of known adaptors at both ends of each fragment can allow amplification of all the fragments in the pool as each fragment possesses two universal ends.

Alternatively the double stranded concatenated polynucleotides may be made circular by attaching the ends together. In some embodiments, double stranded molecules produced by concatenation may be circularised by ligation. The circularisation can be carried out before or after the addition of adaptors. This may be useful in the generation of circular nucleic acid constructs and plasmids or in the preparation of samples for sequencing using platforms that employ circular templates (e.g. PacBio SMRT sequencing). In some embodiments, populations of circularised nucleic acid fragments produced as described herein may be denatured and subjected to rolling circle or whole genome amplification. In cases where the strands have an adaptor region, the amplification can be performed using an amplification primer that hybridises to the adaptor oligonucleotide. Amplification of circular fragments can be carried out using primers complementary to two regions of the single adaptor sequence.

An alternative to locus specific amplification is the use of random priming. Random priming is used in techniques such as whole genome amplification (WGA). Having a universal primer on one end of a population of single stranded fragments and a random primer on the opposite end means that amplification is more efficient that having random primers on both ends, as is the case with WGA.

The concatenated fragments can be used in any subsequent method of sequence determination. For example, the fragments can undergo parallel sequencing on a solid support. In such cases the attachment of universal adaptors to each end may be beneficial in the amplification of the population of fragments. Suitable sequencing methods are well known in the art, and include Ulumina sequencing, pyrosequencing (for example 454 sequencing) or Ion Torrent sequencing from Life Technologies™.

Populations of concatenated nucleic acid molecules with a 3 ' adaptor oligonucleotide and optionally a 5' second adaptor oligonucleotide may be sequenced directly. For example, the sequences of the first and second adaptor oligonucleotides may be specific for a sequencing platform. For example, they may be complementary to the flowcell or device on which sequencing is to be performed. This may allow the sequencing of the population of nucleic acid fragments without the need for further amplification and/or adaptation.

The first and second adaptor sequences may be different. Preferably, the adaptor sequences and any tag sequences, if present, are not found within the human genome, or the particular genome being analysed

The nucleic acid strands in the population may have the same first adaptor sequence at their 3 ' ends and the same second adaptor sequence at their 5' ends i.e. all of the fragments in the population may be flanked by the same pair of adaptor sequences. Suitable adaptor oligonucleotides for the production of nucleic acid strands for sequencing may include a region that is complementary to the universal primers on the solid support (e.g. a flowcell or bead) and a region that is complementary to universal sequencing primers (i.e. which when annealed to the adaptor oligonucleotide and extended allows the sequence of the nucleic acid molecule to be read). Suitable nucleotide sequences for these interactions are well known in the art and depend on the sequencing platform to be employed. Suitable sequencing platforms include Illumina TruSeq, LifeTech IonTorrent, Roche 454 and PacBio RS.

For example, the sequences of the first and second adaptor oligonucleotides may comprise a sequence that hybridises to complementary primers immobilised on the solid support (e.g. 20- 30 nucleotides) and a sequence that hybridises to a sequencing primer (e.g. 30-40 nucleotides). Suitable first and second adaptor oligonucleotides may be 56-80 nucleotides in length. The adaptors may be configured as single strands containing both DNA and RNA, or as two or three strands.

Following any preparation step indicated herein, including extension, digestion, ligation, concatenation, adaptation and/or labelling as described herein, the nucleic acid molecules may be purified by any convenient technique. Following preparation, the population of nucleic acid molecules may be provided in a suitable form for further treatment as described herein. For example, the population of nucleic acid molecules may be in aqueous solution in the absence of buffers before treatment as described herein. In other embodiments, populations of nucleic acid molecules with a 3 ' adaptor oligonucleotide and optionally a 5' adaptor oligonucleotide, may be further adapted and/or amplified as required, for example for a specific application or sequencing platform.

Preferably, the nucleic acid strands in the population may have the same first adaptor sequence at their 3' ends and the same second adaptor sequence at their 5' ends i.e. all of the fragments in the population may be flanked by the same pair of adaptors, as described above. This allows the same pair of amplification primers to amplify all of the strands in the population and avoids the need for multiplex amplification reactions using complex sets of primer pairs, which are susceptible to mis-priming and the amplification of artefacts.

Suitable first and second amplification primers may be 20-25 nucleotides in length and may be designed and synthesised using standard techniques. For example, a first amplification primer may hybridise to the first adaptor sequence i.e. the first amplification primer may comprise a nucleotide sequence complementary to the first adaptor oligonucleotide; and a second amplification primer may hybridise to the complement of second adaptor sequence i.e. the second amplification primer may comprise the nucleotide sequence of the second adaptor oligonucleotide. Alternatively, a first amplification primer may hybridise to the complement of first adaptor sequence i.e. the first amplification primer may comprise a nucleotide sequence of the first adaptor oligonucleotide; and a second amplification primer may hybridise to the second adaptor sequence i.e. the second amplification primer may comprise the nucleotide sequence of the second adaptor oligonucleotide.

In some embodiments, the first and second amplification primers may incorporate additional sequences. Additional sequences may include index sequences to allow identification of the amplification products during multiplex sequencing, or further adaptor sequences to allow sequencing of the strands using a specific sequencing platform.

In some embodiments, a portion of the nucleic acid sample may be oxidised using an oxidising agent. The oxidising agent may be a non-enzymatic oxidising agent, for example, an organic or inorganic chemical compound. Suitable oxidising agents are well known in the art and include metal oxides, such as KRuC , MnC>2 and KMnCV Particularly useful oxidising agents are those that may be used in aqueous conditions, which are most convenient for the handling of the polynucleotide. However, oxidising agents that are suitable for use in organic solvents may also be employed where practicable.

In some embodiments, the oxidising agent may comprise a perruthenate anion (Ru0₄ ^"). Suitable perruthenate oxidising agents include organic and inorganic perruthenate salts, such as potassium perruthenate (KRu0₄) and other metal perruthenates; tetraalkyl ammonium perruthenates, such as tetrapropylammonium perruthenate (TPAP) and tetrabutylammonium perruthenate (TBAP); polymer-supported perruthenate (PSP) and tetraphenylphosphonium ruthenate. The oxidising agents may be a metal (VI) oxo complex. The oxidising agent may be manganate (Mn(VI)0₄ ^2"), ferrate (Fe(VI)0₄ ^2"), osmate (Os(VI)0₄ ^2"), ruthenate (Ru(VI)0₄ ^2" ), or molybdate (Mo(VI)04^2").

Advantageously, the oxidising agent or the oxidising conditions may also preserve the polynucleotide in a denatured state.

Following treatment with the oxidising agent, the polynucleotides in the first portion may be purified. Purification may be performed using any convenient nucleic acid purification technique. Suitable nucleic acid purification techniques include spin-column chromatography.

The polynucleotide may be subjected to further, repeat oxidising steps. Such steps are undertaken to maximise the conversion of 5-hydroxycytosine to 5-formylcytosine. This may be necessary where a polynucleotide has sufficient secondary structure that is capable of re- annealing. Any annealed portions of the polynucleotide may limit or prevent access of the oxidising agent to that portion of the structure, which has the effect of protecting 5-hydroxycytosine from oxidation.

In some embodiments, the portion of the population of polynucleotides may for example be subjected to multiple cycles of treatment with the oxidising agent followed by purification. For example, one, two, three or more than three cycles may be performed.

In some embodiments, a portion of the population of polynucleotides comprising the sample nucleotide sequence may be reduced. In other embodiments, a further portion of the population of polynucleotides comprising the sample nucleotide sequence may be reduced. Reduction converts 5-formylcytosine residues in the sample nucleotide sequence into 5- hydroxymethylcytosine.

The portions of polynucleotides may be reduced by treatment with a reducing agent. The reducing agent is any agent suitable for generating an alcohol from an aldehyde. The reducing agent or the conditions employed in the reduction step may be selected so that any 5-formylcytosine is selectively reduced (i.e. the reducing agent or reduction conditions are selective for 5-formylcytosine). Thus, substantially no other functionality in the polynucleotide is reduced in the reduction step. The reducing agent or conditions are selected to minimise or prevent any degradation of the polynucleotide.

Suitable reducing agents are well-known in the art and include NaBFL_t, NaC BH₃ and LiBH₄. Particularly useful reducing agents are those that may be used in aqueous conditions, as such are most convenient for the handling of the polynucleotide. However, reducing agents that are suitable for use in organic solvents may also be employed where practicable.

Following oxidation and reduction respectively, the reduced and oxidised portion of the population are treated with bisulfite. A second portion of the population which has not been oxidised or reduced is also treated with bisulfite. The bisulfite treatment can be done separately on the three samples, or, if tagged primers are used, the samples can be pooled so that the reduced, oxidised and untreated sample are all exposed to bisulfite in the same reaction.

Bisulfite treatment converts both cytosine and 5-formylcytosine residues in a polynucleotide into uracil. Where any 5 -carboxy cytosine is present (as a product of the oxidation step), this 5-carboxycytosine is converted into uracil in the bisulfite treatment. Without wishing to be bound by theory, it is believed that the reaction of the 5-formylcytosine proceeds via loss of the formyl group to yield cytosine, followed by a subsequent deamination to give uracil. The 5-carboxycytosine is believed to yield the uracil through a sequence of decarboxylation and deamination steps. Bisulfite treatment may be performed under conditions that convert both cytosine and 5-formylcytosine or 5-carboxycytosine residues in a polynucleotide as described herein into uracil. A portion of the population may be treated with bisulfite by incubation with bisulfite ions (HSO₃ ² ). The use of bisulfite ions (HSO₃ ^2") to convert unmethylated cytosines in nucleic acids into uracil is standard in the art and suitable reagents and conditions are well known to the skilled person. Numerous suitable protocols and reagents are also commercially available (for example, EpiTect™, Qiagen NL; EZ DNA Methylation™ Zymo Research Corp CA; CpGenome Turbo Bisulfite Modification Kit; Millipore).

Experimental data:

Termination of extension using UTP a) Amplification of Phixl74 545bp region by DreamTaq PCR in the presence of UTP. The PCR reaction should proceed as normal as all four dNTPs plus UTP are present.

Materials:

Phi9 primer: GCCACGTATTTTGCAAGCTATTTAACTGG

Phi 15 primer: CGAAGGGGACGAAAAATGGTTTTTAGAGAA

Template: ΦΧ174 RF I DNA

Enzyme: DreamTaq

Method:

10X Dreamtaq buffer 5 (IX)

10 mM each dNTPs 1 (0.2 mM each dNTPs)

10 mM UTP 0-10 (0-2 mM UTP)

4 uM Phi9 primer 6.25 (0.5 uM)

4 uM Phi 15 primer 6.25 (0.5 uM)

Phix (1 ng/ul) 1 (1 ng)

DreamTaq (5 U/ul) 0.25 (1.25 U)

Water x

50 ul

95C 2mins 25 cycles

72C 15mins

4C forever

10 ul of PCR was loaded on 2% agarose gel at 100V for 1 hr.

Results seen in Figure 3. UTP concentration >1 mM inhibited amplification by DreamTaq PCR. A low level of UTP has no effect on the PCR. b) Extension termination using single loci specific Phix primer and UTP

A single primer was annealed to Phix and extended in the presence of 0.2 mM UTP, dGTP, dCTP and dATP. DreamTaq should incorporate 16 nucleotides before terminating at the first T. Controls included were no dNTPs (no extension) and 0.2 mM dNTPs (full extension). The extension products were observed on gels.

Materials:

Phi2 primer: TCTTTAGTCGCAGTAGGCGGAAAA

Template: ΦΧ174 RF I DNA

Enzyme: DreamTaq

Method:

1 2 3

10X Dreamtaq buffer 5 5 5 (IX)

10 mM each dNTPs 0 1 0 (0.2 mM)

10 mM dCTP,dGTP,dATP,UTP 0 0 1 (0.2 mM)

10 uM Phi2 primer 3.5 3.5 3.5 (0.7 uM)

Phix (1 ug/ul) 5 5 5 (5 ug)

DreamTaq (5 U/ul) 0.25 0.25 0.25 (1.25 U)

Water X X X

50 ul 50 ul 50 ul

95C 2mins 50 cycles

4C forever

All samples were denatured at 95C for 2 mins and snap cooled on ice. Ran 10 ul on 4-20% TBE PAGE gel at 200V for 35 mins.

Results are shown in Figure 4. No extension was observed with no dNTP (1). Long extension products were observed with dNTP (2). Terminated extension products were observed with the UTP mix (3). The position of the T in the sequence correlates with the sizes of the products on the gel. UTP is terminating the extension at the first T. Thus the desired products can be obtained as shown in lane 3.

Phi 2 primer

' ^■>

TCTTTAGTCG CAGTAGG CGGAAAACG AACAAG CG CAAG AG TAAACATAGTG CCATG CTC

SI nuclease digests ssDNA 5' and 3' overhangs to leave blunt dsDNA

Two primers were annealed that gave 5' and 3' overhangs. The overhangs were removed by SI nuclease digest.

Primer 94: 70bases

3' P 5' ⁵¹ ÷ 5' P

Primer 10: 31bp

31 bases

Materials:

Primer 10: 5' phosphate GATCGGAAGAGCACACGTCTGAACTCCAGTC

Primer 94:

CAAGCAGAAGACGGCATACGAGATGCGGACGTGACTGGAGTTCAGACGTGTGCT CTTCCGATCTNNNNNN

Enzymes: SI nuclease

Purification: Qiagen nucleotide removal kit

Method: Annealing of oligos 1 2 3

Primer 94 (4 uM) 14.3 0 14.3 (0.29 uM)

Primer 10 (4 uM) 0 14.3 14.3 (0.29 uM)

10X DreamTaq buffer 20 20 20 (IX)

Water X X X

200 ul 200 ul 200 ul

95C 2.5mins

50C lmin

4C forever

The annealed oligos were purified using the Qiagen nucleotide removal kit, eluting in 50 ul EB.

S I nuclease Treatment

Annealed oligos 11

5X SI nuclease buffer 6 (IX)

S I nuclease (lOU/ul) 0 (10U)

Water

30 ul 30 ul

20C for 30 mins.

The reaction was stopped with 2 ul 0.5 M EDTA.

The SI nuclease reactions were purified using the Qiagen nucleotide removal kit, eluting in 30 ul EB. Ran 10 ul on 4-20% TBE PAGE gel at 200V for 30 mins.

Results seen in Figure 5 show that S I nuclease removes both ssDNA primers that are not annealed. For the annealed primers, S I nuclease removes 5' and 3 ' overhangs to leave 31bp blunt dsDNA:

Primer 94: 70bases SI

P 5'

Primer 10: llbp

31 bases S I nuclease digested DNA can be ligated

S I nuclease treatment should produce blunt dsDNA with 5' phosphates ready for concatenation by ligation. The S I nuclease treated annealed oligos (from above) were concatenated using Quick ligase.

Materials:

Enzymes: T4 PNK, Illumina end repair mix, Quick ligase

Purification: Qiagen nucleotide removal kit

Method:

Took S I nuclease treated annealed oligos from above and divided into three.

1) Ligation

2) T4 PNK treatment (adds 5' phosphates and removes 3 'phosphates) and ligation

3) Illumina end repair (blunts DNA and adds 5' phosphates and removes 3 'phosphates) and ligation

T4 PNK

S I nuclease annealed oligos 10

10X T4 PNK buffer (IX)

lOmM ATP (lmM)

Water

T4 PNK (lOU/ul) (10U)

37C for 30 mins.

Purified using Qiagen nucleotide removal kit, eluting in 30 ul EB.

End repair

S I nuclease annealed oligos 10

Illumina end repair mix 40

Water 50

100 ul

30C for 30 mins.

Purified using Qiagen nucleotide removal kit, eluting in 30 ul EB.

Claims

Quick ligation DNA not modified 10 0 0 T4 PNK DNA 0 30 0 End repaired DNA 0 0 30 2X Quick ligation buffer 50 50 50 Quick ligase 2.5 2.5 2.5 Water X X X 100 ul 100 ul 100 ul 20 C for 15 mins. Purified using Qiagen nucleotide removal kit, eluting in 30 ul EB. Loaded 10 ul on 4-20% TBE PAGE gel. 200V for 30 mins. Results are shown in Figure 6. The results shown indicate concatenation of SI nuclease annealed adapters by ligation is successful. The duplex monomer is 31bp and a ladder of products observed following exposure to the ligase. PNK or end repair treatment is not improving ligation suggesting SI nuclease does produce 5' phosphorylated blunt dsDNA. Claims:

1. A method of identifying the nucleotides present at a plurality of locations in a nucleic acid sample, the method comprising;

b. digesting the extended primers to make blunt ended fragments;

e. sequencing the concatenated products to produce a sequencing read of each concatenated product, each sequencing read having multiple extended primers per read.

2. The method of claim 1 wherein at least 100,000 locations are analysed per sample.

3. The method of claim 1 wherein the extension is carried out with less than four dNTP's.

4. The method of claim 3 wherein the extension is carried out with two or three dNTP's.

5. The method of claim 1 wherein the extension is carried out with one or more rNTP's.

6. The method of claim 5 wherein the extension is carried out with one rNTP and three dNTP's.

7. The method of any preceding claim wherein the extended primers are between 25-70 bases in length.

8. The method of any preceding claim wherein the concatenated products are at least 400 base pairs in length.

9. The method of any preceding claim wherein the concatenated products average between 400-600 base pairs in length.

10. The method of any preceding claim wherein the adaptors are hairpin constructs such that the adaptors form closed circular fragments when attached to the concatenated products.

11. The method of claim 1 wherein the concatenated products are amplified.

12. The method of claim 1 wherein the sequencing is carried out on a solid support.

13. The method of any preceding claim wherein the nucleic acid sample is treated with bisulfite. The method of claim 13 wherein a portion of the sample is treated with bisulfite and the sequence of the bisulfite treated sample is compared with the sequence of a sample which was not bisulfite treated.

A method for determining the methylation profile of a nucleic acid sample comprising;

a) bisulfite treating a nucleic acid sample;

b) hybridising the bisulfite treated sample with a population of loci specific primers;

c) extending the loci specific primers using a polymerase and one or more dNTP's;

d) making the extended primers blunt ended duplexes;

f) attaching universal adapters to one or both ends of the concatenated products; and

g) sequencing the concatenated products to produce a sequencing read of each concatenated product, each sequencing read having multiple extended primers per read.

A kit comprising one rNTP and three dNTP's, a nucleic acid polymerase, one or more nucleases and one or more ligases.