GB2447679A

GB2447679A - Scanning probe microscopy-based polynucleotide sequencing and detection

Info

Publication number: GB2447679A
Application number: GB0705367A
Authority: GB
Inventors: Jean Ernest Sohna Sohna
Original assignee: Individual
Current assignee: Individual
Priority date: 2007-03-21
Filing date: 2007-03-21
Publication date: 2008-09-24
Also published as: GB0705367D0

Abstract

A method for the sequencing of target polynucleotides comprises the steps of (i) performing the polymerase reaction to extend suitable primers hybridised to the target polynucleotides using labelled nucleotide triphosphates; (ii) scanning flat substrates containing ultrahigh densities of the labelled single stranded or double stranded polynucleotides obtained in step (i) with scanning probe microscopy (SPM); (iii) analysing the images recorded from scanning to obtain at once the complete sequences of all the polynucleotides immobilised on the substrate. Methods for the detection with SPM of polynucleotides using oligonucleotide probes prepared from the labelled nucleosides and immobilised on a flat surface are also provided. The nucleotides are typically labeled with a moiety suitable for SPM detection attached via a non-cleavable linker to the base.

Description

TITLE OF THE INVENTION: SCANNING PROBE MICROSCOPY-BASED POLYNUCLEOTIDES

DETECTION AND SEQUENCING.

DESCRIPTION OF THE INVENTION

FIELD OF THE INVENTION

This invention relates to the detection and the sequencing of polynucleotides, In particular, the invention describes modified labelled nucleotides and nucleosides and methods for detecting and analysing the sequence of polynucleotides using scanning probe microscopy (SPM) and sequencing by primer extension (SPREX).

BACKGROUND TO THE INVENTION

The Sanger DNA sequencing approach introduced in 1977 revolutionised biological science and allowed the sequencing of nucleic acids such as DNA and RNA.

The Sanger methods are based on chain termination and rely on the use of labelled dideoxy derivatives of the four nucleotide triphosphates which are incorporated into a nascent polynucleotide chain in a polymerase reaction. Upon incorporation, the dideoxy derivatives terminate the polymerase reaction. The labelled polynucleotide fragments obtained are size fractionated using gel electrophoresis and analysed to determine the order of bases.

Developments of automated fluorescent DNA sequencers based on this approach and rapid increases in instrument throughput enabled the completion of a blue print of the Human Genome.

Despite the progress made, Sanger based methods are slow, labour intensive and expensive, It still costs an estimated $10 millions US dollars to sequence a mammalian genome. There is a need to develop new sequencing technologies to reduce the cost of sequencing a mammalian genome to $100,000 US dollars and ultimately $1000 or less. The attainment of this goal will enable the sequencing of each person's genome which will lead to individualised approaches for diagnosing, treating and preventing disease.

The so-called next generation sequencing technologies introduced in the past few years are non-electrophoretic and are mainly based either on sequencing by synthesis (SBS) or on the use of nanopores associated with various detection systems.

The concept of sequencingbysynthesis (SBS) involves the detection of the identity of each nucleotide immediately after its incorporation into a growing strand of DNA in a polymerase reaction.

One approach of SBS relies on the use of modified nucleotides as reversible terminators, in which a different fluorophore with a distinct fluorescent emission is linked to each of the 4 bases through a cleavable linker and the 3-OH group is capped by a small chemical moiety. DNA polymerase incorporates only a single nucleotide analogue complementary to the base on a DNA template covalently linked to a surface. After incorporation, the unique fluorescence emission is detected to identify the incorporated nucleotide and the fluorophore is subsequently removed

I

chemically or photochemically. The 3-OH group is then chemically regenerated, which allows the next cycle of the polymerase reaction to proceed. SBS is performed on arrays of single polynucleot ides or on clusters of identical polynucleotides obtained through a localised amplification of the polynucleotide to be sequenced (WO 2005-065814) Another approach of SBS relies on pyrosequencing which is a real-time sequencing strategy based on the release of pyrophosphate during enzymatic DNA synthesis. A first nucleotide triphosphate is introduced into the polymerase reaction mixture; when it is the correct complement to the target strand, its incorporation results into the release of pyrophosphate that is converted by an enzymatic reaction to a chemiluminescent signal that is detectable. The other three nucleotides are then added independently in an iterative process.

The most successful approach to date developed by 454 Life Sciences uses the amplification of a DNA fragment immobilized on a bead from a single fragment to several million identical copies. This amplification is necessary to generate sufficient identical DNA to obtain a strong signal from the sequencing reaction.

Although these methods are now commercially available, it is not believed that, they are likely to deliver the $1000 US dollars genome. In fact the estimated cost of sequencing a human genome is still around $ I million US dollars.

Nanopores approaches.

A number of research groups rely on nanopore-based sequencing methods to deliver ultra fast and inexpensive genome sequences. The underlying principle of nanopore sequencing is that a single-stranded polynucleotide molecule is electrophoretically driven through a nano- scale pore in such a way that the bases traverse the pore sequentially.

The detection mechanisms incorporated into the nanopore uses the distinct electrical and physical properties of each of the bases. Theoretically very long reads of polynucleotide sequences can be achieved in extremely short time scales.

Despite their great potential for improvement in speed, read length and sensitivity, currently none of these methods has been shown to achieve single nucleotide resolution of pOlynucleotides. An overview of these sequencing approaches can be found on the internet site of the National Human Genome Research Institute (NHGRI) at www.genome.gov.

Scanning probe microscopy (SPM) approaches.

The atomic force microscopy (AFM) was invented in 1986 by Binnig. This technology used a tip attached to the end of a flexible cantilever. The distance between the AFM tip and a target surface is controlled by a piezoelectric device. Scanning the tip across the surface causes the cantilever to deflect. This deflection is due to interactions between the tip and the surface. The deflection is measured by a laser reflected from the surface of the cantilever. This process generates a topographic image of the target surface. The resolution by AFM depends in part on the radius and shape of the tip.

AFM was used to scan a closely packed plasmid DNA adsorbed to a cationic lipid bilayer surface. High-resolution images of DNA double helix with the expected pitch of 3.4 nanometers were obtained (FEBS Letters 1995, 371, 279-282; Journal of Physical Chemistiy B 1997, 101, 441-449). This remarkable result triggered speculations that single nucleotide resolution of polynucleotide might be achieved using more refined tips and this would be the basis of DNA and RNA sequencing using AFM. Although measurable improvements in AFM resolution have been achieved using single-walled carbon nanotubes tips (Nature Biotechnology 2000, 18, 760-763), sequencing by AFM remains an elusive goal.

Lindsay of Arizona State University has proposed a new sequencing technology using Atomic Force Microscopy (AFM) in combination with naturally occurring ring-shaped sugar molecules called cyclodextrins. The method relies on the ring molecules, when coupled to the AFM probe, to serve as sensors to read the sequence of the DNA bases. The cyclodextrins are just big enough to slide a strand of DNA through.

Lindsay proposes to attach the reactive groups on the ring to the sensitivity of an AFM tip, which would thread an anchored DNA molecule into the ring and pull it through, recording the subtle variations resulting from the friction of the different DNA bases with the ring. The resulting data will be translated into the precise sequence of the DNA molecule. This technology has to overcome important technical hurdles to achieve its potential and no data has been published yet.

Rouvain in US 20040214177 describes a closely related sequencing approach in which a device comprises a location for the placement of at least one cyclic molecule (e.g. rotaxane) and a linear polymer (e.g. DNA) is threaded through said cyclic molecule; the tip of an SPM is attached to the cyclic molecule or the linear polymer and a signal resulting from the interaction of the cyclic molecule and each unit of the polymer is produced and read to identify the sequence of the polymer.

Another DNA sequencing method proposed by the Mechatronics Research Laboratory at the Massachusetts Institute of Technology uses an atomic force microscope to measure the specific binding interaction of nucleotides. A single molecule of DNA is denatured and immobilized on an atomically flat surface, and a force probe functionalised with a nucleotide is scanned along the molecule to detect locations of the nucleotide's complement. This method is under development and has not yet demonstrated single nucleotide resolution useful for DNA sequencing.

Scanning tunneling microscopy (STM) was invented by Binnig before the AFM. In STM, a metallic tip is brought very close to a conductive substrate and by applying a voltage between both conductive media; a tunnelling current flows between the two electrodes. The direction of the tunnelling depends on the bias polarity. The exponential distance dependence of the tunnelling current leads to excellent control of the distance between the tip and the surface enabling the achievement of very high resolution on atomically flat conductive substrate. For imaging purposes, the tip and substrate are scanned precisely relative to one another and the current is monitored as a function of the lateral position. The contrast in STM images reflects both topography and electronic effects.

The STM was discovered unexpectedly, to give high resolution images of biological macromolecules such as DNA on mica (an insulator) in humid air (Guckenberger et al Science, 1994, 266, 1538-1540). The STM resolution on humid DNA molecules appears to be revealing the major groove of DNA double helix which has a pitch of 3.4 nm. This level of resolution which corresponds to approximately i 0 base-pairs does not allow direct sequencing of non modified DNA and polyflucleotides using STM.

Researchers have proposed the use of nanolabels or nanocodes to enable polynucleotide detection and sequencing using STM.

Yamakawa etal (US 20050147981) proposes the use of a series of labelled oligonucleot ides including each a known nucleotide sequence and a molecular nanocode to bind to a nucleic acid of unknown sequence. The nanocodes are selected from the group consisting of carbon nanotubes, fullerenes, submicrometer metallic barcodes, nanoparticles and quantum dots. The detection of the nanocodes using Scanning probe microscopy (SPM) allows the determination of the sequence of the target nucleic acid from the sequences of the labelled oligonucleotides. Some knowledge of the sequence of the target nucleic acid is necessary to choose labelled oligonucleoie for the experiment.

A substrate with pathways to align labelled DNA molecules for sequencing by scanning tunnelling microscopy (STM) has also been disclosed, by Sargent et at in PCI International Application WO 96 24,689. In this method, the nucleic acid molecule is modified with a base specific label by analog incorporation during synthesis or by complementation The nucleotide sequence is determined by orienting single-straflde nucleic acid molecules on a surface having one or more linear alignment paths, the path is scanned using SPM and the presence of base specific labels along the length of the molecule are recorded. The process is repeated in other paths with labels specific for each base, and the data sets thus provided are combined to give the complete nucleotide sequence.

Henderson et at (US 6716578) propose a method for determining the order of nucleic acid segments from a target nucleic acid. The method comprises tagging of sequencespecpfic sites of the target nucleic acid with a sequence specific tag, scanning the target nucleic acid using a scanning probe microscope, and analysing the scan to determine the order of nucleic acid segments. This method does not provide single flucleotide resolution and is not easily applicable to the sequencing of unknown nucleic acids.

The SPM-based sequencing technologies of prior art are difficult to implement and have not yet been able to demonstrate individual base resolution necessary for sequencing pOlynucleotides.

There is still a need for a technology with the potential to revolutionise genetic analysis by dramatically improving the speed and reducing the cost of a range of genetic analysis applications, including whole-genome de novo sequencing and resequencing and expression profiling.

We propose herein an invention that combines sequencing by primer extension (SPREX) and scanning probe microscopy in a novel way compared to prior art.

SUMMARY OF THE INVENTION

The invention relates to novel labelled nucleoside phosphoramidites and nucleotide triphosphates and methods for polyflucleotides sequencing using Sequencing by Primer Extension (SPREX) and scanning probe microscopy (SPM). The invention relates also to methods for detecting nucleic acids.

The nucleic acids and polynucleotides referred to herein are naturally occurring or synthetic, they include but are not limited to genomic DNA, cDNA and mRNA.

Viewed from a first aspect, the invention describes the chemical structures of labelled nucleosicie phosphoramidites and nucleotide triphosphates. The labelled nucleoside phosphoramidites may be used in the synthesis of oligonucleot,des using automated sequencers. The labelled nucleotide triphosphates may be incorporated, in solution or on a solid support, to a nascent strand complementary to the target polynucleotides by naturally occurring or genetically modified polymerases. In contrast to sequencing-by-synthesis methods in prior art, the whole complementary strand is synthesised in one step. The nature of the base incorporated is not identified after single incorporation, but instead the sequence of all the bases is determined at once after the complete synthesis of the complementary strand.

The molecular labels (MLA) are attached to the base of the nucleotide or nucleoside through a spacer.

The molecular label may have a contour length from a few angstroms to 50 nm more preferably from 1 to 10 nm. Their chemical structure is based on but not limited to an alkyl or alkenyl chain, a polypeptide, a modified polypeptide, a neutral or charged oligomer or polymer. The preferred example of MLA is a polyelectrolyte chain positively or negatively charged; among polyelectrolytes, rigid and rod-like are preferred.

Another preferred example of MLA is a conjugated, conducting or semi-conducting oligomer or polymer chain based on but not limited to pyrrole, thiophene, phenylene ethynylene, phenylene vinylene, aniline, pyridine and acetylene moieties.

Other m electron rich compounds including but not limited to phthalocyanines, subphthalocyanines, porphyrin are also examples of MLA.

Electro-active moieties including but not limited to ferrocenyl compounds and transition metal complexes are also examples of MLA included in the invention.

Conjugated, rigid and rod-like chemical moieties are the most preferred examples of MLAs.

The MLAs are designed to favour the stable physisorption or chemisorption of the modified oligonucleotides on the substrates prior to scanning.

It will be apparent after examining the entire invention that certain agents that are related may be substituted for the molecular labels described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the present invention.

Viewed from a second aspect, the invention describes the synthesis of complementary strands to target polynucleotides. These complementary strands are obtained by primer extension using the labelled nucleotides and a suitable polymerase. The double stranded or single stranded labelled polynucleotides obtained are then immobilised on a flat surface. In some embodiments the target polynucleotides are immobilised on a flat substrate prior to the synthesis of their complementary strands incorporating the labelled nucleotides.

Viewed from a third aspect, the invention describes modifications of atomically flat surfaces that may be used for polynucleotides immobilisation and scanning probe microscopy.

Viewed from a fourth aspect, the invention provides unprecedented ultra high densities of immobilised polynucleotides that may enable the immobilisation and analysis of a whole genome on a single chip. The scanning of the chip surface with scanning probe microscopy is then used to determine the sequence of all the immobilised polynucleotides.

Viewed from a fifth aspect, the invention describes the sensitive detection of polynucleotides using oligonucleotides probes incorporating the labelled nucleosides described herein in hybridisation assays associating SPM.

In summary:

1) In the case of polynucleoticie sequencing, the invention may comprise the following steps: i) synthesis of complementary strands to target single stranded polynucleotides using the modified nucleotides triphosphates of the invention; ii) immobilisation at a very high density of the double or single stranded newly synthesised polynucleotides onto a flat surface (this second step is omitted when the first step is carried on polynucleotides already immobilised to a surface); iii) determination of the base sequences of all the immobilised polynucleotides on the substrate using scanning probe microscopy techniques including but not limited to Atomic Force Microscopy (AFM), Scanning Tunnelling Microscopy (STM), Scanning Electrochemical Microscopy (SECM), Scanning Near field Optical Microscopy (SNOM).

2) In the case of nucleic acids detection, the invention may comprise the following steps: I) synthesis of single stranded oligonucleotide probes containing at least two modified nucleotides, ii) contacting the probes with unknown polynucleotides in conditions favouring hybridisatior,, iii) immobilisation of the products of the reaction on a flat surface, iv) scanning of the flat substrate using scanning probe microscopy and analysis of the image obtained to identify hybridisation events. In an alternative, the single stranded probes may be immobilised to the flat substrate prior to the hybridisation experiment with unknown polynucleotides. The identification of hybridisation events is based on the fact that the binding of a complementary strand to the probe triggers structural changes that can be readily visualized using SPM.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1: Figure Ia shows a schematic representation of the structures of the labelled nucleotide triphosphates that may be used according to the present invention, the example of an amino propargyl group is chosen between the base and the spacer but other groups known to those skilled in the art may be used. Figure 1 b illustrates the schema of figure Ia in the case of deoxy nucleotide triphosphate (dNTPs). The spacer and the molecular label (MLA) are not drawn to scale.

Figure 2: Figure 2a shows a schematic representation of the structures of the labelled nucleotide phosphorarnidites that may be used according to the present invention, the example of an amino propargyl group is chosen between the base and the spacer but other groups known to those skilled in the art may be used. Figure 2b illustrates the schema of figure Ia in the case of deoxy nucleotide phosphoramidites. The spacer and the molecular label (MLA) are not drawn to scale.

Figure 3: Figure 3 shows examples of molecular label (MLA) structures based on phenylene ethynylene, phenylene, phenanthroline ethynyfene, phenylenevinylene. X is any reactive chemical group known in the art including but not limited to amine, carboxylic acid.

Figure 4: Figure 4 shows examples of MLA structures based on thiophene, pyrrole and furan. X is any reactive chemical group known in the art including but not limited to amine, carboxylic acid.

Figure 5: Figure 5 shows examples of MLA structures based on aniline and fluorene.

Figure 6: Fig 6 shows examples of MLA structures based on oligopeptides and incorporating natural and non-natural amino acids.

Figure 7: Fig 7 shows examples of modified nucleotides nucleotides triphosphates bearing an oligothiophenebased MLA.

Figure 8: Fig 8 shows other examples of modified nucleotides nucleotides triphosphates bearing an oligothiophene-based MLA.

Figure 9: Fig 9 shows examples of modified nucleotides nucleotides triphosphates bearing an oligophenyIeneethynylenebas MLA.

Figure 10: Fig 10 shows other examples of modified nucleotides nucleotides triphosphates bearing an oligophenyleneethynylenebased MLA.

DETAILED DESCRIPTION OF THE INVENTION

The invention describes methods for polynucleotide sequencing using Sequencing by Primer Extension (SPREX) and scanning probe microscopy (SPM) and methods for detecting nucleic acids. The invention describes also modified nucleotides that can be used for the aforementioned polynucleotide sequencing and nucleic acid detection.

The invention provides for an advance over the current sequencing by synthesis methods in that it eliminates time-consuming repetitive and expensive steps inherent to sequencing by synthesis in which the identity of the base has to be determined after each incorporation and the fluorophore and the blocking group on 3' position of the sugar has to be cleaved before the next incorporation. In prior art, the use of fluorescent labels limits the density of the polynucleotides on the array and the read length of the polynucleoticje and ultimately affect the throughput. The inventions allows for the preparation of unprecedented ultra high density arrays of polynucleotides and the determination of all their base sequence at once.

By allowing single base identification, the invention provides also an advance over the scanning probe microscopy methods that have yet to achieve single-nucleotide resolution.

The nucleotide or nucleoside molecules of the invention have each a base that is linked to a molecular label (MLA) via a non-cleavable linker.

The base may be a purine, or a pyrimidine. The base can be a deazapurine. The molecule may have a ribose or deoxyribose sugar moiety. Although the nucleotide and nucleoside examples provided herein show the 2'-deoxyribose sugar moieties found in DNA, the invention applies also to nucleotides and nucleosides containing the ribose sugar moiety found in RNA.

Labelled nucleoside phosphoramjd,tes of the invention that are suitable for oligonucleotide synthesis using automatic synthesisers have at the 5' position a dimethoxytrityl (DM1) protecting group and at the 3' position a 2-cyanoethyl-N, N diisopropyl phosphoramidite moiety.

It will be apparent to those skilled in the art that other reactive groups may be used to obtain the same result. Those reactive groups are also within the scope of the invention.

Labelled nucleotides of the invention that may be substrate to a polymerase have a triphosphate moiety at their 5' position and a free OH at their 3' position.

Figure 1 and Figure 2 give a schematic view of the modified nucleotides and nucleosides of the invention. In these examples 2'-deoxyribose is used but it should be noted that an hydroxyl group may be introduced at the 2' position of the sugar to include also the case of nbose.

Other synthetic nucleotide or nucleoside derivatives having modified base moieties and/or modified sugar moieties can be used in conjunction with the MLA described herein provided that they are capable of undergoing Watson-Crick base pairing. Such derivatives are described for example by Uhlman et al., Chemical Reviews 90: 543-584, 1990.

A preferred embodiment of the invention makes use of the more common purine and pyrimidine bases. In this instance, the linker connecting the base and the MLA is preferentially attached via the 7-position of the purine or the preferred deazapurine analogue, via an 8-modified punne, via an N-6 modified adenosine or an N-2 modified guanine. For pyrimidines, the attachment of the linker is preferably via the 5-position on cytidine, thymidine or uracil and the N-4 position on cytosine. -Although the invention will be further described with an emphasis on DNA, it should be noted that the descriptions will also be applicable to RNA, PNA, and other nucleic acids.

A variety of linkers may be used provided that they have the following characteristics.

They should hold the MLA at a sufficient distance from the nucleotide so as not to interfere with the activity of the enzyme and the Watson Crick base pairing. They should also favour the stable immobilisation of the polynucleotides and may help the ordering of the MLA on the surface used for scanning probe microscopy.

Suitable linkers may include, but are not limited to, saturated or unsaturated alkyl chains, linear or cyclic alkyl chains, oxyethylene units and aromatic or heteroaromatic moieties.

Linkers may also be prepared using standard peptide synthesis techniques with any natural and synthetic amino acid building blocks. For example, commercially available 6-aminohexanoic acid may be incorporated in the linker.

Combinations of the above moieties may be used within one linker. The incorporation of oxyethylene units andlor of hydrophilic or charged naturally occurring or synthetic amino acids may enhance solubility in water.

The linker may be attached to the base through an aminopropyl, an aminopropenyl or an aminopropynyl moiety; the amino group provides a functionality that may react with the linker to form an amide bond for instance. The aminopropynyl moiety is preferred.

The methods of the present invention are different from prior art because they make use of non conventional molecular labels (MLA) that are not required to be fluorescent for the nucleosides and nucleotides.

Modified nucleotides or nucleosides are commonly labelled with fluorescent or luminescent groups to allow the detection of the polynucleotides incorporating them.

Examples of such modified nucleotides can be found in W0200401 8493, US6573374, US 20050170367.

Nucleotides may also be labelled with bulky groups selected from the group consisting of nanoparticles, carbon nanotubes, fullerenes, quantum dots and dendrimers as described in US 20050026163, US 20040219596, US 20050147981.

The labels in prior art do not allow the identification at once of all the bases in a polynucleotide strand. In the case of fluorescent or luminescent labels, the diffraction of light significantly reduces the density of detectable labels per surface unit and therefore limits the throughput. The large size of the bulky labels and their random orientation on a scanning surface do not allow their use for single-nucleotide detection and identification in a polynucleotide.

The MLA of the invention are designed to overcome the limitations of the labels of prior art and allow the identification at once of all the bases in all the polynucleotide molecules immobilised on a flat surface using SPM. In certain embodiments, the MLAs described herein allow unprecedented high densities of polynucleotides to be immobilised and sequenced.

The MLAs of the invention are also designed to enable the ultrasensitive detection of polynucleotides and nucleic acids.

In the study of linear macromolecular brush copolymers synthesised by atom transfer radical polymerisation, Matyjaszewskj et al (Chemical Review 2001, 101, 292 1-2990) discovered that if the grafted chains of the polymer brush were sufficiently long, they could be individually visualised by atomic force microscopy (AFM).

Minko et at (J. Am. Chem. Soc. 2005, 127, 15688-15689) reported recently the visualisation of single poly(2-vinylpyrid,ne) molecules at the solid-liquid interface using AFM.

It follows that polymer chains grafted on polynucleotides may be visualised individually byAFM.

There are numerous reports of studies of the adsorption of molecules at the solid-air and solid liquid interfaces using STM. Molecular and sub-molecular resolution are routinely achieved with it-electron-rich conjugated molecules including but not limited to oligothiophene (Langmuir 2003, 19, 3350-3356), oligophenylene ethynylene ( Langmuir 2004,20,8892-8896) oligophenylene vinylene (Journal of Physical ChemistryB, 2005, 109, 4290-4302), phthalocyanines (Langmuir 2006, 22, 723-728 ).

Molecular resolution has also been achieved with long chain hydrocarbon molecules using STM.

The self-assembly properties of it-conjugated systems are well recognised (Chemical Reviews 2005, 105, 1491-1546).

The design of the MLAs of the invention is based on the recognition that molecular and sub-molecular resolutions are achieved with certain molecules under certain conditions when using SPM.

The present invention teaches the combination of these moieties that possess self-assembly and/or good immobilisation properties at surfaces and that can be detected by SPM at the molecular level with nucleotides and nucleoside to generate labelled oligonucleotides and polynucleotides useful for sequencing and detection of nucleic acids.

The first embodiments of the MLAs of the invention are oligopeptides and polypept ides containing naturally occurring or synthetic aminoacids. In prior art, oligopeptides were only used as spacers between the base and a fluorophore but not as detectable labels.

MLAs of varying length may be prepared using standard peptide synthesis techniques with any amino acid building blocks. Their chemical and physical properties may be easily tailored by the appropriate choice of aminoacids. The aminoacids are chosen to enhance solubility, stable adsorption on the flat substrate and sub-molecular resolution by SPM.

Figure 6 gives examples of MLAs based on oligopeptides. These examples show that different moieties and aminoacids may be combined to favour solubility, structural, electronic, self-assembly and adsorption properties. The examples are given as an illustration and not a limitation of the various structures of MLAs that may be obtained with aminoacids.

As a general rule, any polymer or oligomer chain from 3 angstroms to 50 nm in length, more preferably from I to 10 nm in length detectable by SPM may be used as MLA. A preferred example is polyelectrolyte and oligoelectrolyte chains; among these rigid and rod-like are most preferred.

Other embodiments of the MLAs of the invention are monomer, oligomer or polymer chain containing it-electron-rich, conjugated or conducting moieties including but not limited to pyrrole, thiophene, phenyleneethynylene, phenylenevinylene, fluorene,thienylenevinylene, aniline, pyridine, acetylene, phthalocyanines and aromatic polycyclic compounds. These moieties may bear substituents to enhance their solubility in water, their stable adsorption and self-ordering at the solid-air or solid-liquid interfaces.

Figure 3, Figure 4 Figure 5 give examples of chemical stwctures of these MLAs.

Different MLAs may be obtained by varying the length, chemical nature and structure of the above moieties. Other modifications apparent to one skilled in the art may be introduced as well after considering the entirety of the invention.

It will also be apparent to one skilled in the art that other molecules that are it-electron rich and/or conjugated may be substituted for the examples of MLA described above while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the present invention.

As a general point, monomers and oligomers are preferred to polymers because unlike the latter their properties can be easily tailored.

Although molecules containing it-electron-rich moieties are the preferred MLAs, saturated long chain hydrocarbon molecules that allow a strong adsorption on HOPG and sub-molecular resolution by STM may also be used as molecular labels.

To allow sequencing, each of the four bases bears a specific MLA enabling its unambiguous identification. The labelled nucleotides and nucleosides may be made with all identical spacers and different MLA Alternatively, combinations of different spacers and different MLA may be used.

The modified nucleotides and nucleosides bearing MLA are readily synthesised by one skilled in the art of organic synthesis.

A number of the substructures of the target modified nucleotides and nucleosides are available commercially. Modified nucleoside phosphoramidites and nucleotide triphosphates with an attached amino terminated spacer can be bought from a number of companies such as Eurogentec, Glen Research and others.

FmocNHNNO Example of modified nucleotide commercially available

NC-O 1N

Some of the it-rich monomers, oligomers and polymer are also available commercially. A number of oligothiophenes, oligopyrrole and oligofuran bearing various substituents may be bought from Organic Electronic Chemicals.

R 1f$I11R8 X, Y, Z=S, NH, NR or OR R1-R8 = H, alkyl, Ar, OR, SR or any other substituent n = 0-6 Examples of conjugated heterocycles of Thiophene, Pyrrole and Furan commercially available.

The synthesis of the other MLAs of the invention are known from one skilled in the art.

For a review of some of the synthesis see: Chemical Reviews 1999, 99, 1863-1933; Chemical Reviews 2000, 100, 1605-1644; Chemical Reviews 2000, 100, 253 7-2574; Chemical Reviews 2005, 105, 1197-1279; Langmuir 2005, 21, 7860-7865; Tetrahedron, 2004, 60, 6285-6294.

The oligopeptide-based MLA are readily synthesised by automated solid phase peptide synthesis from commercial amino acids.

The MLA may have a reactive moiety (for instance a carboxylic acid group) allowing its attachment to the rest of the modified nucleotide or nucleoside.

The skilled person will appreciate how to covalently link the different substructures to obtain the labelled nucleotides and nucleosides of the invention.

Scheme 1 shows an example of the synthesis of dUTP bearing a MLA. Uridine is shown as an example but the synthesis outline may be applied to the other bases.

The first step of the synthesis is the attachment of a linker (in this case 6-aminohexanoic acid) to the chosen MLA (a substituted oligothiophene in this case 1) which is in the form of a reactive N-hydroxy succinimide (NHS) ester. The second step is the activation of the carboxylic acid of the product of the first step. The last step is the covalent coupling of a known propargylamine derivative of dUTP 2 to the linker-MLA adduct.

Compounds 1 and 2 as well as the aminohexanoic acid, are commercially available. n is an integer from I to 50 more preferably from 1 to 20, even more preferably from 1 to 6.

This synthesis example is provided by way of illustration, it will be obvious to those skilled in the art that various changes and modifications can be made to achieve the synthesis of the modified nucleotides and nucleosides bearing the MLA from non commercial substructures that are readily prepared from commercial building blocks.

In situ scanning tunnelling microscopy of adsorbed inorganic transition metal complexes and metallo-phthalocyanines allow molecular and sub-molecular resolution of the adsorbates as described for instance in Lan gmuir 2004, 20, 3 159-3165 and Langmuir 2006, 22, 2105-2111. These metal complexes as wel as other red-ox active compounds known to the skilled person are used as molecular labels in some embodiments of the invention.

Scheme I / + + Nç / N HO1NH2 \+

-N-N /

-N -

acid adivation H2N

HO HO HO

Hoo\ 0

P P

II II

-- 0

OH \+ 2

N /\ NL

HJL

S HOHO HO

-N---HOO\o,O

OH

Figure 7, Figure 8, Figure 9 and Figure 10 give some examples of modified nucleotide triphosphates of the invention.

In these figures, the replacement of the triphosphate moiety by a dimethoxy tntyl group (DM1) and the attachment of a cyanoethyl phosphoramidite moiety to the 3' hydroxyl yield the corresponding nucleoside phosphoramidites useful for the synthesis of oligonucleotide.

The invention provides a method of sequencing by primer extension (SPREX) of target polynucleotides using the modified nucleotides triphosphates described above and suitable polymerases.

The target polynucleotides sequenced may include genomic DNA, cDNA, RNA and other naturally occurring or synthetic nucleic acids.

When the target polynucleotide is genomic DNA, it is purified using standard methods.

The genomic DNA may be first amplified by PCR or directly fragmented using suitable enzymes or other forms of fragmentation (i. e. chemical or mechanical). One of the strands of the DNA fragment is either ligated to a hairpin oligonucleotide which provides a self-priming moiety or is ligated to an oligonucleotide with a sequence complementary to the primers to be used. Single stranded DNA fragments are then generated by denaturation. The details of these techniques as well as other techniques capable of achieving the same results will be apparent to the skilled person.

To carry out the primer extension reaction the first step is usually to anneal a primer sequence to the target polynucleotide. Alternatively, the primer and the target polynucleotide may be part of the same molecule when the target polynucleotide was ligated to a hairpin loop structure using a ligation reagent such as a ligase enzyme.

The primer sequence is recognised by the polymerase enzyme and acts as an initiation point for the extension of the complementary strand. Preferably, all the 4 modified nucleotides (A, C, T, G) are then brought into contact with the target polynucleotide simultaneously, to allow the synthesis of the complementary strand.

Other conditions necessary for carrying out the polymerase reaction, including temperature, pH, buffer compositions etc., will be apparent to those skilled in the art.

Many different polymerase enzymes may be used for primer extension, and it will be apparent to the person of ordinary skill which is most appropriate to use. Preferred enzymes include but are not limited to Taq polymerase, Vent (exo-) polymerase, Pwo, DNA polymerase I, polymerase Ill, Klenow fragment, Iii polymerase and ThermalAce polymerase. Examples of such appropriate polymerases are disclosed in Nucleic Acids Research, 2003, 31, 2360-2365; Nucleic Acids Research, 2003, 31, 2636-2646, Journal of the American Chemical Society 2005, 127, 15071-15082; Proc. Natl. Acad. Sc USA, 1996 (93), pp 5281-5285, Nucleic Acids Research, 1999 (27), pp 2454- 2553 and Acids Research, 2002 (30), pp 605-613.

Other polymerases genetically modified to improve their incorporation.of the modified nucleotides may also be used.

Primer extension of a large number of polynudeotides may be performed in solution.

In the case of amphiphilic nucleotides triphosphates containing hydrophilic and hydrophobic substructures (for example a hydrophilic spacer and an hydrophobic molecular label), mixture of water and organic solvent may be used. Primer extension may also be performed in emulsions and vesicles. In one embodiment, the inner medium of the droplet is aqueous and contain the target polynucleotide, a polymerase. The hydrophilic part of the nucleotide triphosphate containing the base is solubilised in the aqueous medium (and is available for the polymerase reaction) while its hydrophobic part is solubilised in the continuous organic medium around the aqueous droplet.

The duplexes obtained by any of the above protocols or other protocols known in the art may be purified using methods such as gel filtration and alcohol precipitation.

Others purification methods will be apparent to the skilled person. The purified duplexes bearing the MLA are then immobilised on an atomically flat surface at a very high density before scanning with SPM. The complementary strands are immobilised either as duplex or single strands.

The ultra high density of polynucleotide immobilised is an important feature of the invention. It is estimated that an entire genome could be immobilised on a flat surface a few centimetres square and analysed. The density achieved here would be several order of magnitude larger than the densities achieved in prior art.

Alternatively the target polynucleotides may be immobilised on the flat surface via any suitable covalent or non covalent linkage of which many are known in the art prior to primer extension. In this case the unincorporated labelled nucleotides can be washed off along with others impurities before scanning.

For example, the target polynucleotides already ligated to a hairpin may be attached covalently to the surface by the reaction of a chemical functionality on the hairpin (i. e.

a nucleophile) that is complementary to a chemical functionality (electrophile) on the surface. Alternatively the hairpin may bear a biotin moiety that will interact with streptavidin molecules adsorbed on the surface. Other immobilisation strategies will be apparent to the skilled person.

In the case the target polynucleotide has portions complementary to the primers to be used, the latter may be first immobilised on the surface as described for the hairpin followed by the hybndisation of the target polynucleotide to the surface bound primers.

Examples of the immobilisation surfaces include but are not limited to freshly cleaved muscovite mica, chemically modified or not, graphite, ultra flat metallic surfaces grown on mica (e.g. Au (111)), highly oriented pyrolitic graphite (HOPG). The immobilisation surface may be modified first to be neutral, positively or negatively charged using methods including but not limited to silanization, self assembled monolayers (SAMS), lipid bilayer, Langmuir-Blodgett films, polyelectrolytes deposition, and metal ions.

In one embodiment, the polynucleotides may be co-immobilised on the surface with suitable molecules chosen for example to impart stability, enhance detection of the MLA or to introduce any other useful property.

In other embodiments, intercalators such as ethidium bromide may be used to increase the length of the nucleic acid strand and augment the spatial separation of the MLA decorating the polynucleotide.

Scheme 2 Polymerase Genomic DNA fragments ligated to hairpin MLA-dNTPs in polymerase buffer 1) primers extension: synthesis of complementary strands to target polynucleotides I I 2) purification of the duplex obtained (native PAGE (15%), alcohol precipitation) 3) Ultra high density immobilisatlon of the polynucleotides labelled with MLAs on a flat surface (e.g. mica, HOPG) 4) Scanning of the surface using SPM and recording of a topographic' image

I I SMALL PORTION OF THE

RECORDED IMAGE

polynucleotide backbone MLAs Scheme 2 shows by way of illustration, not limitation the steps involved in the sequencing of target genomic DNA.

In the case of nucleic acids detection, the first step is the synthesis of single stranded oligonucleotide probes containing several of the labelled nucleotides described herein.

The oligonucleoticle probes have sequences complementary to the target sequences to be detected. The target sequences are from any synthetic or naturally occurring nucleic acids (e.g. nucleic acids from pathogens).

In the second step, the single stranded probes are contacted with target nucleic acids mixtures in conditions favouring hybridisation.

The third step is the immobilisation of the products of the reaction on a flat surface using methods known in the art.

The fourth step is the scanning of the flat substrate using scanning probe microscopy and analysis of the images obtained to identify hybridisation events.

In an alternative, the single stranded probes may be immobilised on the flat substrate prior to the hybridisation experiment with target polynucleotides.

The distribution of the MLA on the probe is such that hybridisation to a complementary polynucleotide induces structural changes in the appearance of the probe that are

detectable by SPM

For example,the identification of hybridisation events may be based on the observation of the ordering of the MLA of each probe. In the absence of hybridisation, single stranded oligonucleotides are easily bent so the attached MLA may be disordered but upon hybridisation with a complementary strand and formation of a duplex an ordering of the MLA may be observed.

The nucleic acid detection may be practised using a very high number of probes.

Each probe has a specific coding system allowing its unambiguous identification. The identification code may be obtained by varying the number and relative position of MLA on the oligonucleotide probes or by using MLA of various lengths or chemical nature. The examples given are for the purpose of explanation and not limitation, others ways of tagging the probes in a specific way using the MLA will be evident to the skilled person.

Scanning probe microscopy techniques used to scan the samples include but are not limited to STM, ECSTM, SECM, AFM, and SNOM. The samples are scanned at the solid/air or solid/liquid interfaces in ambient conditions or at low temperatures using commercially available or custom-built instruments. A wealth of experimental conditions and set-ups apparent to the skilled person may be used. An illustration of experimental conditions that may be used is given in Example 4.

Example I

Treatment of mica with positively charged polyelectrolytes.

A 3g/mI solution in water of polyallylamine hydrochloride from Aldrich is prepared. A drop of this solution is deposited on a freshly cleaved mica surface so as to cover the whole surface. Incubation is performed in a humid chamber for 30 minutes then the mica surface is rinsed gently with Milli-Q water. The treated mica surface is dried in a nitrogen stream and used immediately for polynucleotides immobilisation.

The same protocol is used for other positively charged polyelectrolytes including but not limited to linear and branched polyethyleneimine, poly(diallyl dimethyl ammonium chloride).

Example 2

Preparation of lipid bilayer on a mica surface.

A 0.5 mg/mI solution of dipalmitoyl-trimethyl-ammoniumpropane in 20 mM NaCI at pH 6 iss prepared. This solution is sonicated repeatedly to obtain small unhlamellar vesicles. Clean bilayers are formed when a droplet of the vesicle solution is allowed to incubate overnight on a freshly cleaved mica surface which is then heated at 70 C for minutes.

Example 3

Primer extension.

A sample of genomic DNA purified from a blood sample is subjected to one of several known methods to fragment it into 1000 bp polynucleotide portions. One strand of each polynucleotide is ligated to a hairpin polynucleotide by a ligase enzyme and the other strand is removed after the ligation to yield the target polynucleotide to be sequenced. The details of this method are known to those skilled in the art.

In this example labelled 2'-deoxy nucleotides triphosphates bearing oligothiophene-based MLA of the type described in Figure 7b are used: n = 2 for A, n = 3 for T, n = 4 for G, n = 5 for C. Thermostable inorganic pyrophosphatase (2 U), Vent (exo-) DNA polymerase (10 U), the target polynucleotides (diluted to a total concentration of 10 pM), and the four labelled dNTPs (final concentration 250 jtM each) are mixed in 20.tl of the polymerase reaction buffer (provided by the supplier of the DNA polymerase). The extension reaction is performed by heating the mixture to the optimal temperature of the enzyme in a thermocycler for 1 hour. The reaction is stopped by adding a 60 jii of a 80% formamide solution containing 20 mM EDTA and heating at 99 C for 10 minutes.

The duplexes obtained are purified on native polyacrylamide gel electrophoresis PAGE (15%) 4 C, extracted from the gel and purified by precipitation with ethanol twice.

Example 4

The pOlynucleotides duplexes obtained in Example 3 are dissolved in an appropriate buffer (e. g. 10 mM IRIS, 1mM EDTA). The concentration of polynucleotides in solution is chosen to obtain the desired high density coverage of the surface. The typical concentrations of polynucleotides used are in the range I ng/ml. A volume of this solution sufficient to cover the pre-treated mica described in Example 1 is allowed to incubate on top of the surface for one hour. The surface is then gently rinsed with MilIi-Q water and dried under nitrogen gas. The surface is then scanned using Hydration Scanning Tunneling Microscopy (HSTM) in humid air. This technique is based on the electrical conductivity of molecularly thin water layers which adsorb to the sample surfaces in a humid atmosphere. It allows reliable imaging of biological specimens and even insulators as long as they are hydrophilic.

HSTM is carried out in a humid atmosphere using a low-current scanning tunneling microscope (e.g. Rochester Hills, MI (RHK)). Mechanically cut and electrochimically etched PtIIr (platinum/Iridium) tips are used. The surface is scanned either in a constant height mode or in a constant current mode. Settings for the tunneling current the bias voltage used range from 0.05 to I nA and from -0.1 to -0.9V respectively.

A topographic image of the surface is obtained as schematically illustrated in Scheme 2 in which only a very small portion of the surface (with dimensions in the nanometer range) is represented. An analysis of the image by measuring the length and ordering of the surface features provides the sequences of all the polynucleotides immobilised on the surface.

Claims

TITLE OF THE INVENTION: SCANNING PROBE MICROSCOPY-BASED POLYNUCLEOTIDES

DETECTION AND SEQUENCING.

What is claimed is 1. A method for determining the sequence of polynucleotides comprising the following steps: (i) Synthesis in a single step and in solution of complementary strands to target single stranded polynucleotides using suitable primers, polymerases and the modified nucleotides described therein.

(ii) Immobilisation at a very high density of the double or single stranded newly synthesised polynucleotides onto a flat surface.

(iii) Scanning of the flat substrate containing a very high density of the synthesised polynucleotide using scanning probe microscopy techniques including but not limited to Atomic Force Microscopy (AFM), Scanning Tunnelling Microscopy (STM), Electrochemical STM (ECSTM), Scanning Electrochemical Microscopy (SECM),

Scanning Near field Optical Microscopy (SNOM).

(iv) Determination of the base sequences of all the immobilised polynucleotides on the substrate by analysing the scanning image revealing the labels of the incorporated modified nucleotides.
2. A method for detecting nucleic acids based on the following steps: (i) Synthesis of polynucleotide probes using at least two modified nucleotides described herein.

(ii) Contacting the probes with unknown nucleic acids in conditions favouring hybridisation in solution or on a flat surface.

(iii) lmmobilisation of the products of the solution hybridisation on a flat surface.

(iv) Scanning of the flat surface using Scanning Probe Microscopy (SPM) and identifying the hybndised probes by analysis of the image obtained by scanning.
3. A method according to claim 1, wherein the synthesis of complementary strands is performed on target polynucleotides already immobilised at high density on a flat substrate.
4. The method of claim 1, wherein the modified nucleotides are deoxyri bonucleotide triphosphates or ribonucleotide tnphosphates
5. The method of claim 2, wherein the modified nucleotides are deoxyribonucleotide phosphoramidjtes or nbonucleotide phosphoramjdjtes.

V
6. The method of claim 2, wherein the synthetic polynucleotide probes are immobilised on a flat surface prior to hybridisation with target nucleic acids.
7. A method according to claims 1 and 2, wherein the modified nucleotides incorporate non-natural bases and/or non natural sugar moieties.
8. A method according to claims 1 and 2, wherein the modified nucleotide has a base that has an attached molecular label with a length from 3 angstroms to 100 nanometers and more preferably from I to 10 nanometers.
9. A method according to claim 8, wherein the base is attached to the molecular label via a non-cleavable linker.
I0.A method according to claims 1 and 2, wherein the modified nucleotide has a base that is linked to a molecular label that is an alkyl, alkenyl or aryl chain between 6 and 50 repeat units and preferably between 10 and 30 repeat units.
11. A method according to claims I and 2, wherein the modified nucleotide has a base that is linked to a molecular label that is an oligopeptide or a polypeptide incorporating natural and non-natural amino acids, an oligo pseudo peptide or poly-pseudo peptide incorporating natural and non-natural amino acids.
12. The method of claim 11, wherein the oligopeptide, the polypeptide, the oligo pseudo peptide and poly pseudo form a polyproline II helix (PPII).
13. A method according to claims 1 and 2, wherein the modified nucleotide has a base that is linked to a molecular label that is a neutral or charged oligomer or polymer.
14.A method according to claims I and 2, wherein the modified nucleotide has a base that is linked to a molecular label that is a polyelectrolyte more preferably a rigid and rod-like polyelectrolyte
15.A method according to claims 13 and 14, wherein the neutra' or charged oligomer or polymer or polyelectrolyte contains aromatic and/or nelectron rich moieties.
16. A method according to claims 1 and 2, wherein the modified nucleotide has a base that is linked to a molecular label that is a conjugated or conducting or semi-conducting oligomer or polymer chain based on but not limited to the following moieties pyrrole, thiophene, phenylene ethynylene, phenylene vinylene, fluorene, aniline, pyndine and acetylene.
17.A method according to claims land 2, wherein the modified nucleotide has a base that is linked to a molecular label comprising an electro-active moiety including but not limited to ferrocenyl compounds, transition metal complexes, metal lo-phthalocyanine and porphine.
18.A method according to claims 1 and 2, wherein the molecular label further reacts with chemical groups on the flat surface on adsorption of the labelled polynucleotides.
19.A method according to claims I and 2, wherein the molecular label on the modified nucleotide allows a stable adsorption of the labelled polynucleotides on the flat substrate used.
20. A method according to claims 1 and 2, wherein the molecular label on the modified nucleotide can be visualised by SPM.
21.A method according to claims 1 and 2, wherein the molecular labels of at least two consecutives bases of the modified polynucleotide can be resolved by SPM.
22. A method according to claim 1, wherein each of the four bases of the modified nucleotides has a distinctive molecular label allowing its identification by SPM.
23.A method according to claim I, wherein all the molecular labels on a synthesised complementary strand can be visualised by SPM.
24.A method according to claims 1 and 2, wherein the flat surface or substrate is selected from a group comprising but not limited to mica, highly oriented pyrolitic graphite (HOPG), ultraflat metallic surfaces grown on HOPG such as but not limited to Au(111) and Cu(111).
25.A method according to claim 24, wherein the flat surface or substrate is chemically modified prior to the adsorption of polynucleotides.