US20230374567A1

US20230374567A1 - Method for modifying a template double stranded polynucleotide

Info

Publication number: US20230374567A1
Application number: US18/194,062
Authority: US
Inventors: David Jackson Stoddart; James White
Original assignee: Oxford Nanopore Technologies PLC
Current assignee: Oxford Nanopore Technologies PLC
Priority date: 2016-05-25
Filing date: 2023-03-31
Publication date: 2023-11-23
Also published as: EP3464615A1; US20190194722A1; CN109219665A; WO2017203267A1; GB201609220D0; EP3464615B1; US11649480B2; CN109219665B

Abstract

The invention relates to a method for modifying a template double stranded polynucleotide, especially for characterisation using nanopore sequencing. The method produces from the template a plurality of modified double stranded polynucleotides. These modified polynucleotides can then be characterised.

Description

FIELD OF THE INVENTION

The invention relates to a method for modifying a template double stranded polynucleotide, especially for characterisation using nanopore sequencing.

BACKGROUND OF THE INVENTION

There are many commercial situations which require the preparation of a nucleic acid library. This is frequently achieved using a transposase. Depending on the transposase which is used to prepare the library it may be necessary to repair the transposition events in vitro before the library can be used, for example in sequencing.
There is currently a need for rapid and cheap polynucleotide (e.g. DNA or RNA) sequencing and identification technologies across a wide range of applications. Existing technologies are slow and expensive mainly because they rely on amplification techniques to produce large volumes of polynucleotide and require a high quantity of specialist fluorescent chemicals for signal detection.
Transmembrane pores (nanopores) have great potential as direct, electrical biosensors for polymers and a variety of small molecules. In particular, recent focus has been given to nanopores as a potential DNA sequencing technology.
When a potential is applied across a nanopore, there is a change in the current flow when an analyte, such as a nucleotide, resides transiently in the barrel for a certain period of time. Nanopore detection of the nucleotide gives a current change of known signature and duration. In the strand sequencing method, a single polynucleotide strand is passed through the pore and the identity of the nucleotides are derived. Strand sequencing can involve the use of a molecular brake to control the movement of the polynucleotide through the pore.
International Application No. PCT/GB2014/052505 (published as WO 2015/022544) discloses using a MuA transposase and a population of MuA substrates to produce a plurality of shorter, modified double stranded polynucleotides from a template double stranded polynucleotide. The modified polynucleotides can be designed such that they are each easier to characterise, such as by strand sequencing, than the original template polynucleotide. The MuA transposase is inactivated by heat.

SUMMARY OF THE INVENTION

The invention relates to a method for modifying a template double stranded polynucleotide, especially for characterisation using nanopore sequencing. The method produces from the template a plurality of modified double stranded polynucleotides. These modified polynucleotides can then be characterised.
The inventors have surprisingly demonstrated that it is possible to remove a MuA transposase from modified polynucleotides using a translocase. This avoids the need to heat inactivate the MuA transposase, which may also inactivate any other enzymes or proteins being used in the preparation or characterisation of the modified polynucleotides. Removing the heat inactivation step also dispenses with the need for additional equipment such as a thermal cycler or water bath, used for heating up the sample.
The invention therefore provides a method for modifying a template double stranded polynucleotide, comprising:

- (a) contacting the template polynucleotide with a MuA transposase and a population of double stranded MuA substrates each comprising an overhang at one or both ends of one strand such that the transposase fragments the template polynucleotide and ligates a substrate to one or both ends of the double stranded fragments and thereby producing a plurality of fragment/substrate constructs; and
- (b) using a translocase to remove the MuA transposases from the constructs and thereby producing a plurality of modified double stranded polynucleotides.

DESCRIPTION OF THE FIGURES

FIG. 1 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. No PhiX peak was observed between the upper and lower markers for transpososome 1 (labelled 1) or transpososome 2 (labelled 2) when incubated at room temp in the absence of an enzyme.

FIG. 2 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. A PhiX peak was observed between the upper and lower markers for transpososome 1 (labelled 1) when incubated at 75° C. for 10 minutes.

FIG. 3 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. A PhiX peak was observed between the upper and lower markers for transpososome 2 (labelled 1) when incubated at 75° C. for 10 minutes.

FIG. 4 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. A PhiX peak was not observed between the upper and lower markers for transpososome 1 (labelled 1) when incubated with Hel308Mbu-E284C/S615C-STrEP(C) (SEQ ID NO: 10 with mutations E284C/S615C with a streptavidin tag attached at its C terminus).

FIG. 5 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. A PhiX peak was observed between the upper and lower markers for transpososome 2 (labelled 1) when incubated with Hel308Mbu-E284C/S615C-STrEP (SEQ ID NO: 10 with mutations E284C/S615C with a streptavidin tag attached at its C terminus).

FIG. 6 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. A PhiX peak was observed between the upper and lower markers for transpososome 2 (labelled 1) when incubated with either A) Hel308Mbu-E284C/S615C-STrEP(C) (SEQ ID NO: 10 with mutations E284C/S615C with a streptavidin tag attached at its C terminus) or B) at 75° C. for 10 minutes. A comparison between PhiX with transpososome 2 treated with heat and with Hel308. Red is heat treated, blue is Hel308 treated.

FIG. 7 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. Line 1 corresponds to control sample (i) which has been incubated at room temperature in the absence of a translocase. No tragmentation peak was observed for sample (i). Line 2 corresponds to sample (ii) which has been incubated at 75° C. A tagmentation peak was observed between the upper and lower markers with sample (ii).

FIG. 8 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. Line 1 corresponds to control sample (i) which has been incubated at room temperature in the absence of a translocase. No tragmentation peak was observed for sample (i). Line 3 corresponds to sample (iii) which has been incubated at room temperature with Hel308Mbu-E284C-STrEP(C) (SEQ ID NO: 10 with mutation E284C with a streptavidin tag attached at its C terminus). A tagmentation peak was observed between the upper and lower markers with sample (iii).

FIG. 9 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. Line 1 corresponds to control sample (i) which has been incubated at room temperature in the absence of a translocase. No tragmentation peak was observed for sample (i). Line 4 corresponds to sample (iv) which has been incubated at room temperature with T4 Dda-(E94C/F98W/C109A/C136A/A360C) (SEQ ID NO: 97 with mutations E94C/F98W/C109A/C136A/A360C and then (ΔM1)G1G2 (where (ΔM1)G1G2=deletion of M1 and then addition G1 and G2). A tagmentation peak was observed between the upper and lower markers with sample (iv).

FIG. 10 shows an Agilent 2100 Bioanalyser trace. The lower marker is labelled X and the upper marker is labelled Y. Line 1 corresponds to control sample (i) which has been incubated at room temperature in the absence of a translocase. No tragmentation peak was observed for sample (i). Line 5 corresponds to sample (v) which has been incubated at room temperature with UvrD Eco-(E117C/M380C)-STrEP (SEQ ID NO: 122 with mutations E177C/M380C with a streptavidin tag attached at the C terminus). A tagmentation peak was observed between the upper and lower markers with sample (v).

FIG. 11 shows a bar chart of throughput (y-axis label=kb/nanopore/hr) for samples 1-3 (sample 1=incubation at room temperature with Hel308Mbu-E284C/S615C-STrEP(C) using transpososome with 3′ overhang, sample 2=incubation at 75° C. for 10 minutes and sample 3=incubation at room temp in absence of Hel308Mbu-E284C/S615C-STrEP(C)).

FIG. 12 shows a cartoon representation of a translocase being used to remove a MuA transposase from a construct. The MuA transposase (labelled A) is bound to a double stranded MuA substrate (labelled B) which has two overhangs labelled C at each end of one of the strands. In step 1 the MuA fragments the template polynucleotide and ligates a double stranded MuA substrate to one end producing construct D. In step 2 the translocase (labelled E) was allowed to bind to the construct at one of the overhangs. In step 3 the translocase removes the MuA from the construct producing a modified double stranded polynucleotide. In step 4 a leader was attached to the double stranded polynucleotide which had an enzyme (labelled F) pre-bound which was capable of controlling the movement of the polynucleotide through a nanopore.

It is to be understood that the Figures are for the purpose of illustrating particular embodiments of the invention only, and are not intended to be limiting.

DESCRIPTION OF THE SEQUENCE LISTING

SEQ ID NO: 1 shows the codon optimised polynucleotide sequence encoding the MS-B 1 mutant MspA monomer. This mutant lacks the signal sequence and includes the following mutations: D90N, D91N, D93N, D118R, D134R and E139K.
SEQ ID NO: 2 shows the amino acid sequence of the mature form of the MS-B 1 mutant of the MspA monomer. This mutant lacks the signal sequence and includes the following mutations: D90N, D91N, D93N, D118R, D134R and E139K.
SEQ ID NO: 3 shows the polynucleotide sequence encoding one monomer of α-hemolysin-E111N/K147N (α-HL-NN; Stoddart et al., PNAS, 2009; 106(19): 7702-7707).
SEQ ID NO: 4 shows the amino acid sequence of one monomer of α-HL-NN.
SEQ ID NOs: 5 to 7 show the amino acid sequences of MspB, C and D.
SEQ ID NO: 8 shows the amino acid sequence of the Hel308 motif.
SEQ ID NO: 9 shows the amino acid sequence of the extended Hel308 motif.
SEQ ID NOs: 10 to 58 show the amino acid sequences of Hel308 helicases in Table 1.
SEQ ID NO: 59 shows the RecD-like motif I.
SEQ ID NOs: 60 to 62 show the extended RecD-like motif I.
SEQ ID NO: 63 shows the RecD motif I.
SEQ ID NO: 64 shows a preferred RecD motif I, namely G G P G T G K T.
SEQ ID NOs: 65 to 67 show the extended RecD motif I.
SEQ ID NO: 68 shows the RecD-like motif V.
SEQ ID NO: 69 shows the RecD motif V.
SEQ ID NOs: 70 to 77 show the MobF motif III.
SEQ ID NOs: 78 to 84 show the MobQ motif III.
SEQ ID NO: 85 shows the amino acid sequence of TraI Eco.
SEQ ID NO: 86 shows the RecD-like motif I of TraI Eco.
SEQ ID NO: 87 shows the RecD-like motif V of TraI Eco.
SEQ ID NO: 88 shows the MobF motif III of TraI Eco.
SEQ ID NO: 89 shows the XPD motif V.
SEQ ID NO: 90 shows XPD motif VI.
SEQ ID NO: 91 shows the amino acid sequence of XPD Mbu.
SEQ ID NO: 92 shows the XPD motif V of XPD Mbu.
SEQ ID NO: 93 shows XPD motif VI of XPD Mbu.
SEQ ID NO: 94 shows the polynucleotide sequence of the double stranded portion of a
MuA substrate of the invention.
SEQ ID NO: 95 shows the polynucleotide sequence of the double stranded portion of a MuA substrate of the invention. This sequence is complementary to SEQ ID NO: 94 except that it contains a U at the 3′ end.
SEQ ID NO: 96 shows polynucleotide sequence of the overhang strand of the double stranded MuA substrate of the invention.
SEQ ID NO: 97 shows the amino acid sequence of Dda 1993.
SEQ ID NOs: 98 to 112 show the amino acid sequences of other Dda helicases for use in the invention.
SEQ ID NO: 113 shows the codon optimised polynucleotide sequence encoding the wild-type CsgG monomer from Escherichia coli Str. K-12 substr. MC4100. This monomer lacks the signal sequence.
SEQ ID NO: 114 shows the amino acid sequence of the mature form of the wild-type CsgG monomer from Escherichia coli Str. K-12 substr. MC4100. This monomer lacks the signal sequence. The abbreviation used for this CsgG=CsgG-Eco.
SEQ ID NO: 115 to 121 show polynucleotide sequences used in the Examples.
SEQ ID NO: 122 shows the amino acid sequence of UvrD-Eco wild-type.
It is to be understood that the sequences are not intended to be limiting.

DETAILED DESCRIPTION OF THE INVENTION

It is to be understood that different applications of the disclosed products and methods may be tailored to the specific needs in the art. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments of the invention only, and is not intended to be limiting.
In addition as used in this specification and the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to “a polynucleotide” includes “polynucleotides”, reference to “a substrate” includes two or more such substrates, reference to “a transmembrane protein pore” includes two or more such pores, and the like.
In this specification, where different amino acids at a specific position are separated by the symbol “/”, the symbol “I” means “or”. For instance, P108R/K means P108R or P108K. In this specification, where different positions or different substitions are separated by the symbol “/”, the “I” symbol means “and”. For example, E94/P108 means E94 and P108 or E94D/R108K means E94D and P108K.
All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entirety.

Modification Method

The present invention provides a method of modifying a template polynucleotide. The template may be modified for any purpose. The method is preferably for modifying a template polynucleotide for characterisation, such as for strand sequencing. The template polynucleotide is typically the polynucleotide that will ultimately be characterised, or sequenced, in accordance with the invention. This is discussed in more detail below.
The method provided is a method for modifying a double stranded polynucleotide template, comprising: (a) contacting the polynucleotide template with a MuA transposase in the presence of a double stranded MuA substrate that comprises an overhang at one or both ends of one strand, such that the MuA transposase (i) processes the template polynucleotide to produce a plurality of double stranded fragments and (ii) ligates the double stranded MuA substrate to one or both ends of a double stranded fragment of the plurality, thereby producing a ligation product to which is bound a MuA transposase; and (b) contacting the ligation product with a translocase, such that the translocase processes the ligation product to remove the MuA transposase, thereby producing a plurality of modified double stranded polynucleotides.
The method involves the formation of a plurality of modified double stranded polynucleotides. These modified double stranded polynucleotides are typically easier to characterise than the template polynucleotide, especially using strand sequencing. The plurality of modified double stranded polynucleotides may themselves be characterised in order to facilitate the characterisation of the template polynucleotide. For instance, the sequence of the template polynucleotide can be determined by sequencing each of the modified double stranded polynucleotides.
The modified double stranded polynucleotides are shorter than the template polynucleotide and so it is more straightforward to characterise them using strand sequencing. The modified double stranded polynucleotides may be of any length. The length is determined by the length of the template polynucleotide and the action of the MuA transposase which fragments the polynucleotide. Typically, the modified double stranded polynucleotride is less than about 5000 kb.
The modified double strand polynucleotides can be selectively labelled by including the labels in the MuA substrates. Labelling is selective in that only the modified double stranded polynucleotides produced by the MuA transposase are labelled. A label is an entity that enables sample identification, barcoding and/or tracking of the modified double stranded polynucleotide. Suitable labels include, but are not limited to, calibration sequences, coupling moieties and adaptor bound enzymes. Examples of coupling moieties include, for example, azide, DBCO, pyridyldithiol and malemide. Calibration sequences include any sequence of a known composition. Adaptor bound enzymes include, for example, translocases, polymerases, helicases and other polynucleotide binding proteins.
In some embodiments, the method introduces into the double stranded polynucleotides modifications which facilitate their characterisation using strand sequencing. It is well-established that coupling a polynucleotide to the membrane containing the nanopore lowers by several orders of magnitude the amount of polynucleotide required to allow its characterisation or sequencing. This is discussed in International Application No. PCT/GB2012/051191 (published as WO 2012/164270). The method of the invention allows the production of a plurality of double stranded polynucleotides each of which include a means for coupling the polynucleotides to a membrane. This is discussed in more detail below.
The characterisation of double stranded polynucleotides using a nanopore typically requires the presence of a leader sequence designed to preferentially thread into the nanopore. The method of the invention allows the production of a plurality of double stranded polynucleotides each of which include a single stranded leader sequence. This is discussed in more detail below.
It is also well established that linking the two strands of a double stranded polynucleotide by a bridging moiety, such as hairpin loop, allows both strands of the polynucleotide to be characterised or sequenced by a nanopore. This is advantageous because it doubles the amount of information obtained from a single double stranded polynucleotide. Moreover, because the sequence in the template complement strand is necessarily orthogonal to the sequence of the template strand, the information from the two strands can be combined informatically. Thus, this mechanism provides an orthogonal proof-reading capability that provides higher confidence observations. This is discussed in International Application No. PCT/GB2012/051786 (published as WO 2013/014451). The method of the invention allows the production of a plurality of modified double stranded polynucleotides in which the two strands of each polynucleotide are linked using a hairpin loop.

Template Polynucleotide

The method of the invention modifies a template double stranded polynucleotide, preferably for characterisation. The template polynucleotide is typically the polynucleotide that will ultimately be characterised, or sequenced, in accordance with the invention. It may also be called the target double stranded polynucleotide or the double stranded polynucleotide of interest. A polynucleotide, such as a nucleic acid, is a macromolecule comprising two or more nucleotides. The polynucleotide or nucleic acid may comprise any combination of any nucleotides. The nucleotides can be naturally occurring or artificial. One or more nucleotides in the template polynucleotide can be oxidized or methylated. One or more nucleotides in the template polynucleotide may be damaged. For instance, the polynucleotide may comprise a pyrimidine dimer. Such dimers are typically associated with damage by ultraviolet light and are the primary cause of skin melanomas. One or more nucleotides in the template polynucleotide may be modified, for instance with a label or a tag. Suitable labels are described below. The template polynucleotide may comprise one or more spacers.
A nucleotide typically contains a nucleobase, a sugar and at least one phosphate group. The nucleobase and sugar form a nucleoside.
The nucleobase is typically heterocyclic. Nucleobases include, but are not limited to, purines and pyrimidines and more specifically adenine (A), guanine (G), thymine (T), uracil (U) and cytosine (C).
The sugar is typically a pentose sugar. Nucleotide sugars include, but are not limited to, ribose and deoxyribose. The sugar is preferably a deoxyribose.
The template double stranded polynucleotide preferably comprises the following nucleosides: deoxyadeno sine (dA), deoxyuridine (dU) and/or thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC).
The nucleotide is typically a ribonucleotide or deoxyribonucleotide. The nucleotide is preferably a deoxyribonucleotide. The nucleotide typically contains a monophosphate, diphosphate or triphosphate. Phosphates may be attached on the 5′ or 3′ side of a nucleotide.
Nucleotides include, but are not limited to, adenosine monophosphate (AMP), guanosine monophosphate (GMP), thymidine monophosphate (TMP), uridine monophosphate (UMP), 5-methylcytidine monophosphate, 5-hydroxymethylcytidine monophosphate, cytidine monophosphate (CMP), cyclic adenosine monophosphate (cAMP), cyclic guanosine monophosphate (cGMP), deoxyadenosine monophosphate (dAMP), deoxyguanosine monophosphate (dGMP), deoxythymidine monophosphate (dTMP), deoxyuridine monophosphate (dUMP) and deoxycytidine monophosphate (dCMP). The nucleotides are preferably selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP, dCMP and dUMP. The nucleotides are most preferably selected from dAMP, dTMP, dGMP, dCMP and dUMP.
The template double stranded polynucleotide preferably comprises the following nucleotides: dAMP, dUMP and/or dTMP, dGMP and dCMP.
A nucleotide may be abasic (i.e. lack a nucleobase). A nucleotide may also lack a nucleobase and a sugar (i.e. is a C3 spacer).
The nucleotides in the template polynucleotide may be attached to each other in any manner. The nucleotides are typically attached by their sugar and phosphate groups as in nucleic acids. The nucleotides may be connected via their nucleobases as in pyrimidine dimers.
The template polynucleotide is double stranded. The template polynucleotide may contain some single stranded regions, but at least a portion of the template polynucleotide is double stranded.
The template polynucleotide can be a nucleic acid, such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The template polynucleotide can comprise one strand of RNA hybridised to one strand of DNA. The polynucleotide may be any synthetic nucleic acid known in the art, such as peptide nucleic acid (PNA), glycerol nucleic acid (GNA), threose nucleic acid (TNA), locked nucleic acid (LNA) or other synthetic polymers with nucleotide side chains.
The template polynucleotide can be any length. For example, the polynucleotide can be at least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400 or at least 500 nucleotide pairs in length. The polynucleotide can be 1000 or more nucleotide pairs, 5000 or more nucleotide pairs in length or 100000 or more nucleotide pairs in length.
The template polynucleotide is typically present in any suitable sample. The invention is typically carried out on a sample that is known to contain or suspected to contain the template polynucleotide. Alternatively, the invention may be carried out on a sample to confirm the identity of one or more template polynucleotides whose presence in the sample is known or expected.
The sample may be a biological sample. The invention may be carried out in vitro using at least one sample obtained from or extracted from any organism or microorganism. The organism or microorganism is typically archaeal, prokaryotic or eukaryotic and typically belongs to one of the five kingdoms: plantae, animalia, fungi, monera and protista. The invention may be carried out in vitro on at least one sample obtained from or extracted from any virus. The sample is preferably a fluid sample. The sample typically comprises a body fluid of the patient. The sample may be urine, lymph, saliva, mucus or amniotic fluid but is preferably blood, plasma or serum. Typically, the sample is human in origin, but alternatively it may be from another mammal animal such as from commercially farmed animals such as horses, cattle, sheep, fish, chickens or pigs or may alternatively be pets such as cats or dogs. Alternatively, the sample may be of plant origin, such as a sample obtained from a commercial crop, such as a cereal, legume, fruit or vegetable, for example wheat, barley, oats, canola, maize, soya, rice, rhubarb, bananas, apples, tomatoes, potatoes, grapes, tobacco, beans, lentils, sugar cane, cocoa, broccoli or cotton.
The sample may be a non-biological sample. The non-biological sample is preferably a fluid sample. Examples of non-biological samples include surgical fluids, water such as drinking water, sea water or river water, and reagents for laboratory tests.
The sample is typically processed prior to being used in the invention, for example by centrifugation or by passage through a membrane that filters out unwanted molecules or cells, such as red blood cells. The sample may be measured immediately upon being taken. The sample may also be typically stored prior to assay, preferably below −70° C.

MuA and Conditions

The template polynucleotide is contacted with a MuA transposase. This contacting occurs under conditions which allow the transposase to function, e.g. to fragment the template polynucleotide and to ligate MuA substrates to the one or both ends of the fragments. MuA transposase is commercially available, for instance from Thermo Scientific (Catalogue Number F-750C, 20 μL (1.1 μg/μL)). The MuA translocase may be a wild type MuA translocase or a modified MuA translocase. Conditions under which MuA transposase will function are known in the art. Examples of suitable conditions are described in the Examples.

Population of Substrates

The template polynucleotide is contacted with a population of double stranded MuA substrates. The MuA substrates contain a known MuA recognition sequence. Incubation of the template polynucleotide and MuA substrates with MuA results in adaptor formation. The double stranded substrates are polynucleotide substrates and may be formed from any of the nucleotides or nucleic acids discussed above. The MuA substrates are typically formed from the same nucleotides as the template polynucleotide, except for the universal nucleotides or at least one nucleotide which comprises a nucleoside that is not present in the template polynucleotide.
The population of substrates is typically homogenous (i.e. typically contains a plurality of identical substrates). The population of substrates may be heterogeneous (i.e. may contain a plurality of different substrates).
Suitable substrates for a MuA transposase are known in the art (Saariaho and Savilahti, Nucleic Acids Research, 2006; 34(10): 3139-3149 and Lee and Harshey, J. Mol. Biol., 2001; 314: 433-444).
Each substrate typically comprises a double stranded portion which provides its activity as a substrate for MuA transposase. The double stranded portion is typically the same in each substrate. The population of substrates may comprise different double stranded portions.
The double stranded portion in each substrate is typically at least 50 nucleotide pairs in length, such as at least 55, at least 60 or at least 65 nucleotide pairs in length. The double stranded portion may have a length of up to 10 kb, such as 5 kb, 1 kb or 100 base pairs. The double stranded portion in each substrate preferably comprises a dinucleotide comprising deoxycytidine (dC) and deoxyadenosine (dA) at the 3′ end of each strand. The dC and dA are typically in different orientations in the two strands of the double stranded portion, i.e. one strand has dC/dA and the other strand has dA/dC at the 3′ end when reading from 5′ to 3′.
One strand of the double stranded portion preferably comprises the sequence shown in SEQ ID NO: 94 and the other strand of the double stranded portion preferably comprises a sequence which is complementary to the sequence shown in SEQ ID NO: 94.

Overhangs

Each substrate comprises an overhang at one or both ends of one strand, i.e. at least one overhang on one strand. The one strand in the double stranded substrate having an overhang at one or both ends is also called the one substrate strand.
If there is only one overhang, it is preferably located at the 5′ end of the one substrate strand. After fragmentation of the template polynucleotide and ligation of the MuA substrate to the fragments of the template polynucleotide (tagmentation), constructs comprising a fragment of the template polynucletide and one or more MuA substrates are formed. In such embodiments, a translocase that moves in the 5′ to 3′ may be used to remove the MuA transposases from the constructs.
If there are two overhangs, i.e. one at each end of one substrate strand, a translocase that moves in either direction, i.e. from 5′ to 3′ or from 3′ to 5′, may be used to remove the MuA transposases from the constructs.
Each substrate preferably comprises a double stranded portion which comprises the sequence shown in SEQ ID NO: 94 hybridised to a sequence which is complementary to the sequence shown in SEQ ID NO: 94. The one overhang is preferably at the 5′ end of the sequence which is complementary to the sequence shown in SEQ ID NO: 94. The sequence complementary to the sequence shown in SEQ ID NO: 94 may have overhangs at both ends. The sequence complementary to the sequence shown in SEQ ID NO: 94 is the one substrate strand.
The overhang may be at least 3, at least 4, at least 5, at least 6 or at least 7 nucleotides in length. The overhang may have a length of up to about 200 nucleotides, such as about 100, 50, 25 or 10 nucleotides. The overhang is preferably 5 nucleotides in length. The overhang may comprise any of the nucleotides discussed above.
If the overhang at the 5′ end of the one substrate strand is not closed after formation of the constructs, the translocase will remove both the MuA transposase and the one substrate strand, i.e. the substrate strand with the overhang. If the overhang at the 5′ end of the one substrate strand is closed after formation of the constructs, the translocase will remove only the MuA transposase.
Closure of the overhang occurs for example where the 5′ end of the overhang is ligated to the adjacent 3′ end of a strand of the template polynucleotide fragment.

Universal Nucleotides

In one embodiment, each substrate comprises an overhang at both ends of one strand and the overhang at the 5′ end is formed from universal nucleotides. The overhang preferably consists of universal nucleotides. This allows the overhang to be closed after formation of the constructs. Each substrate preferably comprises a double stranded portion which comprises the sequence shown in SEQ ID NO: 94 hybridised to a sequence which is complementary to the sequence shown in SEQ ID NO: 94. The overhang formed from universal nucleotides is at the 5′ end of the sequence which is complementary to the sequence shown in SEQ ID NO: 94.
The overhangs may be at least 3, at least 4, at least 5, at least 6 or at least 7 nucleotides in length. The overhangs are preferably 5 nucleotides in length.
A universal nucleotide is one which will hybridise to some degree to all of the nucleotides in the template polynucleotide. A universal nucleotide is preferably one which will hybridise to some degree to nucleotides comprising the nucleosides adenosine (A), thymine (T), uracil (U), guanine (G) and cytosine (C). The universal nucleotide may hybridise more strongly to some nucleotides than to others. For instance, a universal nucleotide (I) comprising the nucleoside, 2′-deoxyinosine, will show a preferential order of pairing of I-C>I-A>I-G approximately =I-T. For the purposes of the invention, it is only necessary that the universal nucleotide used in the oligomers hybridises to all of the nucleotides in the template polynucleotide.
The universal nucleotide preferably comprises one of the following nucleobases: hypoxanthine, 4-nitroindole, 5-nitroindole, 6-nitroindole, 3-nitropyrrole, nitroimidazole, 4-nitropyrazole, 4-nitrobenzimidazole, 5-nitroindazole, 4-aminobenzimidazole or phenyl (C6-aromatic ring. The universal nucleotide more preferably comprises one of the following nucleosides: 2′-deoxyinosine, inosine, 7-deaza-2′-deoxyinosine, 7-deaza-inosine, 2-aza-deoxyinosine, 2-aza-inosine, 4-nitroindole 2′-deoxyribonucleoside, 4-nitroindole ribonucleoside, 5-nitroindole 2′-deoxyribonucleoside, 5-nitroindole ribonucleoside, 6-nitroindole 2′-deoxyribonucleoside, 6-nitroindole ribonucleoside, 3-nitropyrrole 2′-deoxyribonucleoside, 3-nitropyrrole ribonucleoside, an acyclic sugar analogue of hypoxanthine, nitroimidazole 2′-deoxyribonucleoside, nitroimidazole ribonucleoside, 4-nitropyrazole 2′-deoxyribonucleoside, 4-nitropyrazole ribonucleoside, 4-nitrobenzimidazole 2′-deoxyribonucleoside, 4-nitrobenzimidazole ribonucleoside, 5-nitroindazole 2′-deoxyribonucleoside, 5-nitroindazole ribonucleoside, 4-aminobenzimidazole 2′-deoxyribonucleoside, 4-aminobenzimidazole ribonucleoside, phenyl C-ribonucleoside or phenyl C-2′-deoxyribosyl nucleoside. The universal nucleotide is most preferably comprises 2′-deoxyinosine.
The universal nucleotides in each overhang may be different from one another. The universal nucleotides in each overhang are preferably the same. All of the universal nucleotides in the population of substrates are preferably the same universal nucleotide.
The method of the invention preferably comprises

- (a) contacting the template polynucleotide with a MuA transposase and a population of double stranded MuA substrates each comprising an overhang at both ends of one strand, wherein the overhang at the 5′ end of the one strand consists of universal nucleotides, such that the transposase fragments the template polynucleotide into fragments and ligates a substrate to one or both ends of the double stranded fragments and thereby producing a plurality of fragment/substrate constructs;
- (b) allowing the overhangs consisting of universal nucleotides to hybridise to the opposite fragment strands in the constructs;
- (c) ligating the overhangs consisting of universal nucleotides to the adjacent fragment strands in the constructs; and
- (d) using a translocase to remove the MuA transposases from the constructs and thereby producing a plurality of modified double stranded polynucleotides. In this embodiment, the translocase binds to the overhangs at the 3′ ends of the one substrate strands in the constructs and moves 3′ to 5′ to remove the MuA tranposase. Since the 5′ overhang is closed, the one substrate strands remain in the constructs.

The overhang(s) of universal nucleotides may further comprise a reactive group, preferably at the 5′ end. The reactive group may be used to ligate the overhangs to the fragments in the constructs as discussed below. The reactive group may be used to ligate the fragments to the overhangs using click chemistry. Click chemistry is a term first introduced by Kolb et al. in 2001 to describe an expanding set of powerful, selective, and modular building blocks that work reliably in both small- and large-scale applications (Kolb H C, Finn, MG, Sharpless K B, Click chemistry: diverse chemical function from a few good reactions, Angew. Chem. Int. Ed. 40 (2001) 2004-2021). They have defined the set of stringent criteria for click chemistry as follows: “The reaction must be modular, wide in scope, give very high yields, generate only inoffensive by-products that can be removed by nonchromatographic methods, and be stereospecific (but not necessarily enantioselective). The required process characteristics include simple reaction conditions (ideally, the process should be insensitive to oxygen and water), readily available starting materials and reagents, the use of no solvent or a solvent that is benign (such as water) or easily removed, and simple product isolation. Purification if required must be by nonchromatographic methods, such as crystallization or distillation, and the product must be stable under physiological conditions”.
Suitable examples of click chemistry include, but are not limited to, the following:

- (a) copper-free variant of the 1,3 dipolar cycloaddition reaction, where an azide reacts with an alkyne under strain, for example in a cyclooctane ring;
- (b) the reaction of an oxygen nucleophile on one linker with an epoxide or aziridine reactive moiety on the other; and
- (c) the Staudinger ligation, where the alkyne moiety can be replaced by an aryl phosphine, resulting in a specific reaction with the azide to give an amide bond.

Any reactive group may be used in the invention. The reactive group may be one that is suitable for click chemistry. The reactive group may be any of those disclosed in International Application No. PCT/GB10/000132 (published as WO 2010/086602), particularly in Table 4 of that application.
In a further embodiment, the modification method uses a MuA transposase and a population of MuA substrates each comprising at least one overhang comprising a reactive group. The overhang(s) may be any length and may comprise any combination of any nucleotide(s). Suitable lengths and nucleotides are disclosed above. Suitable reactive groups are discussed above. Accordingly, the invention provides a method for modifying a template double stranded polynucleotide, comprising:

- (a) contacting the template polynucleotide with a MuA transposase and a population of double stranded MuA substrates each comprising an overhang at both ends of one strand, wherein the overhang at the 5′ end of the one strand comprises a reactive group, such that the transposase fragments the template polynucleotide and ligates a substrate to one or both ends of the double stranded fragments and thereby producing a plurality of fragment/substrate constructs; and
- (b) ligating the overhangs to the fragments in the constructs using the reactive group;
- (c) using a translocase to remove the MuA transposases from the constructs and thereby producing a plurality of modified double stranded polynucleotides. In this embodiment, the translocase binds to the overhangs at the 3′ ends of the one substrate strands in the constructs and moves 3′ to 5′ to remove the MuA tranposase. Since the 5′ overhang is closed, the one substrate strands remain in the constructs.
  Nucleosides that are not Present in the Template Polynucleotide

In one embodiment, each substrate comprises (i) an overhang at both ends of one strand and (ii) at least one nucleotide 10 nucleotides or fewer from the overhang at the 5′ end of the one strand which comprises a nucleoside that is not present in the template polynucleotide. For example, the nucleotide that is not present in the template polynucleotide is typically a non-natural nucleotide where the template polynucleotide comprises only natural nucleotides.
As discussed above, the double stranded portion in each substrate preferably comprises a dinucleotide comprising deoxycytidine (dC) and deoxyadenosine (dA) at the 3′ end of each strand and a dinucleotide comprising thymidine (dT) and deoxyguanosine (dG) at the 5′ end of each strand. In some embodiments, one or both of the nucleotides in the dT and dG dinucleotide of the one substrate strand may be replaced with a nucleotide comprising a nucleoside that is not present in the template polynucleotide as discussed below. In a preferred embodiment, the template polynucleotide comprises deoxyadenosine (dA), thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC), but not deoxyuridine (dU) and the dA in the dC and dA dinucleotide of one strand is replaced with a nucleotide comprising deoxyuridine (dU). This is exemplified below.
The double stranded portion preferably comprises the sequence shown in SEQ ID NO: 94 and a sequence which is complementary to the sequence shown in SEQ ID NO: 94 and which is modified to include at least one nucleotide that is not present in the template polynucleotide. The sequence complementary to SEQ ID NO: 94 further comprises the overhang, i.e. is the one substrate strand. In a more preferred embodiment, the double stranded portion comprises the sequence shown in SEQ ID NO: 94 and the sequence shown in SEQ ID NO: 95 (see below). In SEQ ID NO: 27, the dT in the dT and dG dinucleotide at the 5′ end had been replaced with dU. This double stranded portion (shown below) may be used when the template polynucleotide comprises deoxyadenosine (dA), thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC), but not deoxyuridine (dU).

	(SEQ 94)
	5′-GTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGC

	CGCTTCA-3′

	(SEQ 95)
	3′-CAAAAGCGTAAATAGCACTTTGCGAAAGCGCAAAAAGCACGCG

	GCGAAG U -5′

The overhangs may be at least 3, at least 4, at least 5, at least 6 or at least 7 nucleotides in length. The overhangs are preferably 4 nucleotides in length. The overhangs may comprise any of the nucleotides discussed above.
Each substrate comprises at least one nucleotide in the one substrate strand which is 10 nucleotides or fewer from the overhang at 5′ end and which comprises a nucleoside that is not present in the template polynucleotide. Each substrate may comprise any number of nucleotides which comprise a nucleoside that is not present in the template polynucleotide, such as 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. If a substrate comprises more than one nucleotide that is not present in the template polynucleotide, those nucleotides are typically the same, but may be different.
If the template polynucleotide comprises deoxyadenosine (dA), thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC) but not deoxyuridine (dU), the nucleoside that is not present in the template polynucleotide is preferably deoxyuridine (dU).
In a preferred embodiment, one strand of the double stranded portion comprises the sequence shown in SEQ ID NO: 94 and the other strand of the double stranded portion comprises the sequence shown in SEQ ID NO: 95 (see above). In SEQ ID NO: 95, the dT in the dT and dG dinucleotide at the 5′ end had been replaced with dU. The overhang at the 5′ end of SEQ ID NO: 95 is attached to the U.
In a most preferred embodiment, each substrate comprises the sequence shown in SEQ ID NO: 94 and the sequence shown in SEQ ID NO: 96 (see below). This substrate (shown below) may be used when the template polynucleotide comprises deoxyadenosine (dA), thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC), but not deoxyuridine (dU).

	(SEQ 94)
	5′-GTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGC

	CGCTTCA-3′

	(SEQ 96)
	3′-CAAAAGCGTAAATAGCACTTTGCGAAAGCGCAAAAAGCACGCG

	GCGAAG U CTAG-5′

Each substrate also comprise an overhang at the 3′ end of the sequence shown in SEQ ID NO: 96.
If the template polynucleotide comprises deoxyadenosine (dA), deoxyuridine (dU), deoxyguanosine (dG) and deoxycytidine (dC) but not thymidine (dT), the nucleoside that is not present in the template polynucleotide is preferably thymidine (dT).
The nucleoside that is not present in the template polynucleotide is preferably abasic, adenosine (A), uridine (U), 5-methyluridine (m⁵U), cytidine (C) or guanosine (G) or preferably comprises urea, 5, 6 dihydroxythymine, thymine glycol, 5-hydroxy-5 methylhydanton, uracil glycol, 6-hydroxy-5, 6-dihdrothimine, methyltartronylurea, 7, 8-dihydro-8-oxoguanine (8-oxoguanine), 8-oxoadenine, fapy-guanine, methy-fapy-guanine, fapy-adenine, aflatoxin B 1-fapy-guanine, 5-hydroxy-cytosine, 5-hydroxy-uracil, 3-methyladenine, 7-methylguanine, 1,N6-ethenoadenine, hypoxanthine, 5-hydroxyuracil, 5-hydroxymethyluracil, 5-formyluracil or a cis-syn-cyclobutane pyrimidine dimer.
The at least one nucleotide is 10 nucleotides or fewer from the overhang at the 5′ end, such as 9, 8, 7, 6, 5, 4, 3, 2, 1 or 0 nucleotides from the overhang. In other words, the at least one nucleotide is preferably at any of positions A to K in the Example below. The at least one nucleotide is preferably 0 nucleotides from the overhang (i.e. is adjacent to the overhang). In other words, the at least one nucleotide is preferably at position K in the Example below.

XXXXXXXXXXX

ABCDEFGHIJKXXXX

The at least one nucleotide may be the first nucleotide in the overhang. In other words, the at least one nucleotide may be at position A in the Example below.

XXXXXXXXXX

XXXXXXXXXXAXXX

All of the nucleotides in the overhang may comprise a nucleoside that is not present in the template polynucleotide. A person skilled in the art is capable of designing suitable substrates.
The method of the invention preferably comprises

- (a) contacting the template polynucleotide with a MuA transposase and a population of double stranded MuA substrates each comprising (i) an overhang at both ends of one strand and (ii) at least one nucleotide 10 nucleotides or fewer from the overhang at the 5′ end of the one strand which comprises a nucleoside that is not present in the template polynucleotide such that the transposase fragments the template polynucleotide into fragments and ligates a substrate at one or both ends of the double stranded fragments and thereby producing a plurality of fragment/substrate constructs;
- (b) removing the overhangs at the 5′ end of the one substrate strands from the constructs by selectively removing the at least one nucleotide and thereby producing a plurality of double stranded constructs comprising single stranded gaps;
- (c) repairing the single stranded gaps in the constructs; and
- (d) using a translocase to remove the MuA transposases from the constructs and thereby producing a plurality of modified double stranded polynucleotides.

Ligating the Overhangs

In those embodiments in which the MuA substrates comprise overhangs of universal nucleotides, the method comprises ligating the overhangs to the fragments in the constructs. This may be done using any method of ligating nucleotides known in the art. For instance, it may be done using a ligase, such as a DNA ligase. Alternatively, if the overhangs comprise a reactive group, the reactive group may be used to ligate the overhangs to the fragments in the constructs.
For instance, a nucleotide comprising a complementary reactive group may be attached to the fragments and the two reactive groups may be reacted together to ligate the overhangs to the fragments. Click chemistry may be used as discussed above.

Selective Removal

Methods are known in the art for selectively removing the nucleotide(s) which comprise(s) a nucleoside that is not present in the template polynucleotide from the ligated constructs. Nucleotides are selectively removed if they are removed (or excised) from the ligated constructs, but the other nucleotides in the ligated constructs (i.e. those comprising different nucleosides) are not removed (or excised).
Nucleotides comprising deoxyuridine (dU) may be selectively removed using Uracil-Specific Excision Reagent (USER®), which is a mixture of Uracil DNA glycosylase (UDG) and the DNA glycosylase-lyase Endonuclease VIII.

Repairing the Gaps

Methods are known in the art for repairing the single stranded gaps in the double stranded constructs. For instance, the gaps can be repaired using a polymerase and a ligase, such as DNA polymerase and a DNA ligase. Alternatively, the gaps can be repaired using random oligonucleotides of sufficient length to bring the gaps and a ligase.

Translocases

Any translocase that is capable of removing the MuA transposase may be used in the invention. This may occur, for example, as a result of the unwinding of double stranded polynucleotide by a translocase.
The translocase is preferably a helicase. Suitable helicases are well-known in the art (M. E. Fairman-Williams et al., Curr. Opin. Struct Biol., 2010, 20 (3), 313-324, T. M. Lohman et al., Nature Reviews Molecular Cell Biology, 2008, 9, 391-401).
The helicase is preferably a member of superfamily 1 or superfamily 2. The helicase is more preferably a member of one of the following families: Pif1-like, Upf1-like, UvrD/Rep, Ski-like, Rad3/XPD, NS3/NPH-II, DEAD, DEAH/RHA, RecG-like, REcQ-like, T1R-like, Swi/Snf-like and Rig-I-like. The first three of those families are in superfamily 1 and the second ten families are in superfamily 2. The helicase is more preferably a member of one of the following subfamilies: RecD, Upf1 (RNA), PcrA, Rep, UvrD, Hel308, Mtr4 (RNA), XPD, NS3 (RNA), Mss116 (RNA), Prp43 (RNA), RecG, RecQ, T1R, RapA and Hef (RNA). The first five of those subfamilies are in superfamily 1 and the second eleven subfamilies are in superfamily 2. Members of the Upf1, Mtr4, NS3, Mss116, Prp43 and Hef subfamilies are RNA helicases. Members of the remaining subfamilies are DNA helicases. The helicase may be Srs2. The helicase may be RecBCD.
The helicase is preferably a Hel308 helicase. Any Hel308 helicase may be used in accordance with the invention. Hel308 helicases are also known as ski2-like helicases and the two terms can be used interchangeably. Suitable Hel308 helicases are disclosed in Table 4 of International Application No. PCT/GB2012/052579 (published as WO 2013/057495).
The Hel308 helicase typically comprises the amino acid motif Q-X1-X2-G-R-A-G-R (hereinafter called the Hel308 motif; SEQ ID NO: 8). The Hel308 motif is typically part of the helicase motif VI (Tuteja and Tuteja, Eur. J. Biochem. 271, 1849-1863 (2004)). X1 may be C, M or L. X1 is preferably C. X2 may be any amino acid residue. X2 is typically a hydrophobic or neutral residue. X2 may be A, F, M, C, V, L, I, S, T, P or R. X2 is preferably A, F, M, C, V, L, I, S, T or P. X2 is more preferably A, M or L. X2 is most preferably A or M.
The Hel308 helicase preferably comprises the motif Q-X1-X2-G-R-A-G-R-P (hereinafter called the extended Hel308 motif; SEQ ID NO: 9) wherein X1 and X2 are as described above.
The most preferred Hel308 helicases, Hel308 motifs and extended Hel308 motifs are shown in the Table 1 below.

TABLE 1

Preferred Hel308 helicases and their motifs

			%	%
SEQ			Identity	Identity		Extended
ID			Hel308	Hel308	Hel308	Hel308
NO:	Helicase	Names	Pfu	Mbu	motif	motif

10	Hel308 Mbu	Methanococcoides	37%	—	QMAGRAGR	QMAGRAGRP
		burtonii			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

13	Hel308 Pfu	Pyrococcus	—	37%	QMLGRAGR	QMLGRAGRP
		furiosus DSM			(SEQ ID NO:	(SEQ ID NO: 15)
		3638			14)

16	Hel308 Hvo	Haloferax	34%	41%	QMMGRAGR	QMMGRAGRP
		volcanii			(SEQ ID NO:	(SEQ ID NO: 18)
					17)

19	Hel308 Hla	Halorubrum	35%	42%	QMCGRAGR	QMCGRAGRP
		lacusprofundi			(SEQ ID NO:	(SEQ ID NO: 21)
					20)

22	Hel308 Csy	Cenarchaeum	34%	34%	QLCGRAGR	QLCGRAGRP
		symbiosum			(SEQ ID NO:	(SEQ ID NO: 24)
					23)

25	Hel308 Sso	Sulfolobus	35%	33%	QMSGRAGR	QMSGRAGRP
		solfataricus			(SEQ ID NO:	(SEQ ID NO: 27)
					26)

28	Hel308 Mfr	Methanogenium	37%	44%	QMAGRAGR	QMAGRAGRP
		frigidum			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

29	Hel308 Mok	Methanothermococcus	37%	34%	QCIGRAGR	QCIGRAGRP
		okinawensis			(SEQ ID NO:	(SEQ ID NO: 31)
					30)

32	Hel308 Mig	Methanotorris	40%	35%	QCIGRAGR	QCIGRAGRP
		igneus Kol 5			(SEQ ID NO:	(SEQ ID NO: 31)
					30)

33	Hel308 Tga	Thermococcus	60%	38%	QMMGRAGR	QMMGRAGRP
		gammatolerans			(SEQ ID NO:	(SEQ ID NO: 18)
		EJ3			17)

34	Hel308 Tba	Thermococcus	57%	35%	QMIGRAGR	QMIGRAGRP
		barophilus MP			(SEQ ID NO:	(SEQ ID NO: 36)
					35)

37	Hel308 Tsi	Thermococcus	56%	35%	QMMGRAGR	QMMGRAGRP
		sibiricus MM 739			(SEQ ID NO:	(SEQ ID NO: 18)
					17)

38	Hel308 Mba	Methanosarcina	39%	60%	QMAGRAGR	QMAGRAGRP
		barkeri str. Fusaro			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

39	Hel308 Mac	Methanosarcina	38%	60%	QMAGRAGR	QMAGRAGRP
		acetivorans			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

40	Hel308	Methanohalophilus	38%	60%	QMAGRAGR	QMAGRAGRP
	Mmah	mahii DSM 5219			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

41	Hel308	Methanosarcina	38%	60%	QMAGRAGR	QMAGRAGRP
	Mmaz	mazei			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

42	Hel308 Mth	Methanosaeta	39%	46%	QMAGRAGR	QMAGRAGRP
		thermophila PT			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

43	Hel308 Mzh	Methanosalsum	39%	57%	QMAGRAGR	QMAGRAGRP
		zhilinae DSM			(SEQ ID NO:	(SEQ ID NO: 12)
		4017			11)

44	Hel308 Mev	Methanohalobium	38%	61%	QMAGRAGR	QMAGRAGRP
		evestigatum Z-			(SEQ ID NO:	(SEQ ID NO: 12)
		7303			11)

45	Hel308	Methanococcus	36%	32%	QCIGRAGR	QCIGRAGRP
	Mma	maripaludis			(SEQ ID NO:	(SEQ ID NO: 31)
					30)

46	Hel308 Nma	Natrialba	37%	43%	QMMGRAGR	QMMGRAGRP
		magadii			(SEQ ID NO:	(SEQ ID NO: 18)
					17)

47	Hel308 Mbo	Methanoregula	38%	45%	QMAGRAGR	QMAGRAGRP
		boonei 6A8			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

48	Hel308 Fac	Ferroplasma	34%	32%	QMIGRAGR	QMIGRAGRP
		acidarmanus			(SEQ ID NO:	(SEQ ID NO: 36)
					35)

49	Hel308 Mfe	Methanocaldococcus	40%	35%	QCIGRAGR	QCIGRAGRP
		fervens AG86			(SEQ ID NO:	(SEQ ID NO: 31)
					30)

50	Hel308 Mja	Methanocaldococcus	24%	22%	QCIGRAGR	QCIGRAGRP
		jannaschii			(SEQ ID NO:	(SEQ ID NO: 31)
					30)

51	Hel308 Min	Methanocaldococcus	41%	33%	QCIGRAGR	QCIGRAGRP
		infernus			(SEQ ID NO:	(SEQ ID NO: 31)
					30)

52	Hel308 Mhu	Methanospirillum	36%	40%	QMAGRAGR	QMAGRAGRP
		hungatei JF-1			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

53	Hel308 Afu	Archaeoglobus	40%	40%	QMAGRAGR	QMAGRAGRP
		fulgidus DSM			(SEQ ID NO:	(SEQ ID NO: 12)
		4304			11)

54	Hel308 Htu	Haloterrigena	35%	43%	QMAGRAGR	QMMGRAGRP
		turkmenica			(SEQ ID NO:	(SEQ ID NO: 12)
					11)

55	Hel308 Hpa	Haladaptatus	38%	45%	QMFGRAGR	QMFGRAGRP
		paucihalophilus			(SEQ ID NO:	(SEQ ID NO: 57)
		DX253			56)

58	Hel308 Hsp	Halobacterium sp.	36.8%	42.0%	QMFGRAGR	QMFGRAGRP
	ski2-like	NRC-1			(SEQ ID NO:	(SEQ ID NO: 57)
	helicase				56)

The most preferred Hel308 motif is shown in SEQ ID NO: 17. The most preferred extended Hel308 motif is shown in SEQ ID NO: 18.
The Hel308 helicase preferably comprises the sequence of SEQ ID NO: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 58 or a variant thereof.
A variant of a Hel308 helicase is an enzyme that has an amino acid sequence which varies from that of the wild-type helicase and which retains polynucleotide binding activity. In particular, a variant of SEQ ID NO: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 58 is an enzyme that has an amino acid sequence which varies from that of SEQ ID NO: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 58 and which retains polynucleotide binding activity. Polynucleotide binding activity can be determined using methods known in the art. Suitable methods include, but are not limited to, fluorescence anisotropy, tryptophan fluorescence and electrophoretic mobility shift assay (EMSA). For instance, the ability of a variant to bind a single stranded polynucleotide can be determined as described in the Examples.
The variant retains helicase activity. This can be measured in various ways. For instance, the ability of the variant to translocate along a polynucleotide can be measured using electrophysiology, a fluorescence assay or ATP hydrolysis.
The variant may include modifications that facilitate handling of the polynucleotide encoding the helicase and/or facilitate its activity at high salt concentrations and/or room temperature. Variants typically differ from the wild-type helicase in regions outside of the Hel308 motif or extended Hel308 motif discussed above. However, variants may include modifications within these motif(s).
Over the entire length of the amino acid sequence of SEQ ID NO: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 58, a variant will preferably be at least 30% homologous to that sequence based on amino acid identity. More preferably, the variant polypeptide may be at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of SEQ ID NO: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 58 over the entire sequence. There may be at least 70%, for example at least 80%, at least 85%, at least 90% or at least 95%, amino acid identity over a stretch of 150 or more, for example 200, 300, 400, 500, 600, 700, 800, 900 or 1000 or more, contiguous amino acids (“hard homology”). Homology is determined as described below. The variant may differ from the wild-type sequence in any of the ways discussed below with reference to SEQ ID NOs: 2 and 4.
A variant of SEQ ID NO: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 58 preferably comprises the Hel308 motif or extended Hel308 motif of the wild-type sequence as shown in Table 1 above. However, a variant may comprise the Hel308 motif or extended Hel308 motif from a different wild-type sequence. For instance, a variant of SEQ ID NO: 10 may comprise the Hel308 motif or extended Hel308 motif from SEQ ID NO: 13 (i.e. SEQ ID NO: 14 or 15). Variants of SEQ ID NO: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 58 may also include modifications within the Hel308 motif or extended Hel308 motif of the relevant wild-type sequence. Suitable modifications at X1 and X2 are discussed above when defining the two motifs. A variant of SEQ ID NO: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 or 58 preferably comprises one or more substituted cysteine residues and/or one or more substituted Faz residues to facilitate attachment as discussed above.
A variant of SEQ ID NO: 10 may lack the first 19 amino acids of SEQ ID NO: 10 and/or lack the last 33 amino acids of SEQ ID NO: 10. A variant of SEQ ID NO: 10 preferably comprises a sequence which is at least 70%, at least 75%, at least 80%, at least 85%, at least 90% or more preferably at least 95%, at least 97% or at least 99% homologous based on amino acid identity with amino acids 20 to 211 or 20 to 727 of SEQ ID NO: 10.
The Hel308 helicase may be modified as described in International Application No. PCT/GB2015/051925 (published as WO 2014/013260). In particular, two or more parts on the helicase may be connected to reduce the size of the opening in the polynucleotide domain through which a polynucleotide can unbind from the helicase and wherein the helicase retains its ability to control the movement of the polynucleotide. In Hel308 helicases, the polynucleotide domain and opening can be found between domain 2 (one of the ATPase domains) and domain 4 (the ratchet domain) and domain 2 and domain 5 (the molecular brake). The two or more parts connected in accordance with the invention are preferably (a) any amino acid in domain 2 and any amino acid in domain 4 or (b) any amino acid in domain 2 and any amino acid in domain 5. The amino acid residues which define domains 2, 4 and 5 in various Hel308 helicases are listed in Table 2 below.

TABLE 2

Amino acid residues which correspond to domains
2, 4 and 5 in various Hel308 helicases.

SEQ

Hel308

Domain 2

Domain 4

Domain 5

ID NO:	Homologue	Start	End	Start	End	Start	End

10	Mbu	W200	E409	Y506	G669	S670	Q760
13	Pfu	W198	F398	Y490	G640	I641	S720
16	Hvo	W201	W418	Y509	G725	V726	E827
19	Hla	W201	W418	Y513	G725	V726	R824
22	Csy	W205	G414	Y504	G644	I645	K705
25	Sso	W204	L420	Y506	G651	I652	S717
28	Mfr	W193	E397	Y488	G630	I631	I684
29	Mok	W198	G415	Y551	G706	A707	I775
32	Mig	W200	E408	Y495	G632	A633	I699
33	Tga	W198	R399	Y491	G639	V640	R720
34	Tba	W219	F420	Y512	G660	V661	K755
37	Tsi	W221	L422	Y514	G662	V663	K744
38	Mba	W200	E409	Y498	G643	A644	Y729
39	Mac	W200	E409	Y499	G644	A645	F730
40	Mmah	W196	G405	Y531	G678	A679	N747
41	Mmaz	W200	E409	Y499	G644	A645	Y730
42	Mth	W203	M404	Y491	G629	A630	A693
43	Mzh	W200	N409	Y505	G651	I652	T739
44	Mev	W200	D409	Y499	G643	V644	F733
45	Mma	W196	G405	Y531	G678	A679	N747
46	Nma	W201	W413	Y541	G688	V689	F799
47	Mbo	W197	E402	Y493	G637	I638	G723
48	Fac	F197	T390	Y480	G613	V614	R681
49	Mfe	W199	Q408	Y494	G629	A630	F696
50	Mja	W197	Q406	Y492	G627	A628	F694
51	Min	W189	Q390	Y476	G604	A605	I670
52	Mhu	W198	D402	Y493	G637	V638	C799
53	Afu	W201	F399	Y487	G626	V627	E696
54	Htu	W201	W413	Y533	G680	V681	F791
55	Hpa	W201	W412	Y502	G657	V658	E752
58	Hsp (ski2-	W210	Y421	Y512	G687	V688	S783
	like
	helicase)

The Hel308 helicase preferably comprises the sequence of Hel308 Mbu (i.e. SEQ ID NO: 10) or a variant thereof. In Hel308 Mbu, the polynucleotide domain and opening can be found between domain 2 (one of the ATPase domains) and domain 4 (the ratchet domain) and domain 2 and domain 5 (the molecular brake). The two or more parts of Hel308 Mbu connected are preferably (a) any amino acid in domain 2 and any amino acid in domain 4 or (b) any amino acid in domain 2 and any amino acid in domain 5. The amino acid residues which define domains 2, 4 and 5 for Hel308 Mbu are listed in Table 2 above. The two or more parts of Hel308 Mbu connected are preferably amino acids 284 and 615 in SEQ ID NO: 10. These amino acids are preferably substituted with cysteine (i.e. E284C and S615C) such that they can be connected by cysteine linkage.
The invention may use a mutant Hel308 Mbu protein which comprises a variant of SEQ ID NO: 10 in which E284 and 5615 are modified. E284 and 5615 are preferably substituted. E284 and 5615 are more preferably substituted with cysteine (i.e. E284C and S615C). The variant may differ from SEQ ID NO: 10 at positions other than E284 and 5615 as long as E284 and 5615 are modified. The variant will preferably be at least 30% homologous to SEQ ID NO: based on amino acid identity as discussed in more detail below. E284 and 5615 do not have to be connected. Alternatively, E284 and 5615 may be connected.
The Hel308 helicase more preferably comprises (a) the sequence of Hel308 Tga (i.e. SEQ ID NO: 33) or a variant thereof, (b) the sequence of Hel308 Csy (i.e. SEQ ID NO: 22) or a variant thereof or (c) the sequence of Hel308 Mhu (i.e. SEQ ID NO: 52) or a variant thereof.
SEQ ID NO: 10 (Hel308 Mbu) contains five natural cysteine residues. However, all of these residues are located within or around the DNA binding grove of the enzyme. Once a DNA strand is bound within the enzyme, these natural cysteine residues become less accessible for external modifications. This allows specific cysteine mutants of SEQ ID NO: 10 to be designed and attached to the moiety using cysteine linkage as discussed above. Preferred variants of SEQ ID NO: 10 have one or more of the following substitutions: A29C, Q221C, Q442C, T569C, A577C, A700C and S708C. The introduction of a cysteine residue at one or more of these positions facilitates cysteine linkage as discussed above. Other preferred variants of SEQ ID NO: have one or more of the following substitutions: M2Faz, R10Faz, F15Faz, A29Faz, R185Faz, A268Faz, E284Faz, Y387Faz, F400Faz, Y455Faz, E464Faz, E573Faz, A577Faz, E649Faz, A700Faz, Y720Faz, Q442Faz and S708Faz. The introduction of a Faz residue at one or more of these positions facilitates Faz linkage as discussed above.
The Hel308 helicase is modified by the introduction of one or more cysteine residues and/or one or more non-natural amino acids at one or more of the positions which correspond to D272, N273, D274, G281, E284, E285, E287, 5288, T289, G290, E291, D293, T294, N300, R303, K304, N314, 5315, N316, H317, R318, K319, L320, E322, R326, N328, 5615, K717, Y720, N721 and 5724 in Hel308 Mbu (SEQ ID NO: 10), wherein the helicase retains its ability to control the movement of a polynucleotide. The one or more cysteine residues and/or one or more non-natural amino acids are preferably introduced by substitution.
These modifications do not prevent the helicase from binding to a polynucleotide. For instance, the helicase may bind to a polynucleotide via internal nucleotides or at one of its termini. These modifications decrease the ability of the polynucleotide to unbind or disengage from the helicase, particularly from internal nucleotides of the polynucleotide. In other words, the one or more modifications increase the processivity of the Hel308 helicase by preventing dissociation from the polynucleotide strand. The thermal stability of the enzyme is also increased by the one or more modifications giving it an improved structural stability that is beneficial in Strand Sequencing. The modified Hel308 helicases of the invention have all of the advantages and uses discussed above.
The modified Hel308 helicase has the ability to control the movement of a polynucleotide. This can be measured as discussed above. The modified Hel308 helicase is artificial or non-natural.
The Hel308 helicase preferably comprises a variant of one of the helicases shown in Table 1 above which comprises one or more cysteine residues and/or one or more non-natural amino acids at one or more of the positions which correspond to D272, N273, D274, G281, E284, E285, E287, 5288, T289, G290, E291, D293, T294, N300, R303, K304, N314, 5315, N316, H317, R318, K319, L320, E322, R326, N328, 5615, K717, Y720, N721 and 5724 in Hel308 Mbu (SEQ ID NO: 10). The Hel308 helicase preferably comprises a variant of one of SEQ ID NOs: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 58 which comprises one or more cysteine residues and/or one or more non-natural amino acids at one or more of the positions which correspond to D272, N273, D274, G281, E284, E285, E287, S288, T289, G290, E291, D293, T294, N300, R303, K304, N314, 5315, N316, H317, R318, K319, L320, E322, R326, N328, 5615, K717, Y720, N721 and S724 in Hel308 Mbu (SEQ ID NO: 10).
The Hel308 helicase preferably comprises a variant of one of SEQ ID NOs: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 58 which comprises one or more cysteine residues and/or one or more non-natural amino acids at one or more of the positions which correspond to D274, E284, E285, E287, S288, T289, G290, E291, N316, K319, 5615, K717 or Y720 in Hel308 Mbu (SEQ ID NO: 10).
Table 3a and 3b below show the positions in other Hel308 helicases which correspond to D274, E284, E285, S288, 5615, K717, Y720, E287, T289, G290, E291, N316 and K319 in Hel308 Mbu (SEQ ID NO: 10). For instance, in Hel308 Hvo (SEQ ID NO:16), E283 corresponds to D274 in Hel308 Mbu, E293 corresponds to E284 in Hel308 Mbu, 1294 corresponds to E285 in Hel308 Mbu, V297 corresponds to S288 in Hel308 Mbu, D671 corresponds to 5615 in Hel308 Mbu, K775 corresponds to K717 in Hel308 Mbu and E778 corresponds to Y720 in Hel308 Mbu. The lack of a corresponding position in another Hel308 helicase is marked as a “-”.

TABLE 3a

Positions which correspond to D274, E284, E285, S288,
S615, K717 and Y720 in Hel308 Mbu (SEQ ID NO: 10).

SEQ	Hel308
ID NO:	homologue	A	B	C	D	E	F	G

10	Mbu	D274	E284	E285	S288	S615	K717	Y720
13	Pfu	L265	E275	L276	S279	P585	K690	E693
16	Hvo	E283	E293	I294	V297	D671	K775	E778
19	Hla	E283	E293	I294	G297	D668	R775	E778
22	Csy	D280	K290	I291	S294	P589	T694	N697
25	Sso	L281	K291	Q292	D295	D596	K702	Q705
28	Mfr	H264	E272	K273	A276	G576	K678	E681
29	Mok	S279	L289	S290	D293	P649	K753	R756
32	Mig	Y276	L286	S287	D290	P579	K679	K682
33	Tga	L266	S276	L277	Q280	P583	K689	D692
34	Tba	L287	E297	L298	S301	S604	K710	E713
37	Tsi	L289	Q299	L300	G303	N606	G712	E715
38	Mba	E274	D284	E285	E288	S589	K691	D694
39	Mac	E274	D284	E285	E288	P590	K692	E695
40	Mmah	H272	L282	S283	D286	P621	K725	K728
41	Mmaz	E274	D284	E285	E288	P590	K692	E698
42	Mth	A269	L279	A280	L283	H575	K677	E680
43	Mzh	H274	Q284	E285	E288	P596	K699	Q702
44	Mev	G274	E284	E285	E288	T590	K691	Y694
45	Mma	H272	L282	S283	D286	P621	K725	K728
46	Nma	G277	T287	E288	E291	D634	K737	E740
47	Mbo	A270	E277	R278	E281	S583	G685	E688
48	Fac	Q264	F267	E268	E271	P559	K663	K666
49	Mfe	R275	L285	S286	E289	P576	K676	K679
50	Mja	I273	L283	S284	E287	P574	K674	K677
51	Min	R257	L267	S268	D271	P554	K651	K654
52	Mhu	S269	Q277	E278	R281	S583	G685	R688
53	Afu	K268	K277	A278	E281	D575	R677	E680
54	Htu	D277	D287	D288	D291	D626	K729	E732
55	Hpa	D276	D286	Q287	D290	D595	K707	E710
58	Hsp (ski2-	E286	E296	I297	V300	D633	A737	E740
	like
	helicase)

TABLE 3b

Positions which correspond to E287, T289, G290, E291,
N316 and K319 in Hel308 Mbu (SEQ ID NO: 10).

SEQ	Hel308
ID NO:	homologue	H	I	J	K	L	M

10	Mbu	E287	T289	G290	E291	N316	K319
13	Pfu	D278	L280	E281	E282	D307	V310
16	Hvo	D296	S298	D299	T300	E324	T327
19	Hla	S296	S298	D299	T300	E324	A327
22	Csy	S293	G295	G296	E297	D322	S325
25	Sso	D294	I296	E297	E298	A325	D328
28	Mfr	E275	A277	A278	E279	M304	T307
29	Mok	L292	N294	P295	T296	E320	K323
32	Mig	L289	P291	P292	T293	E317	K320
33	Tga	S279	L281	E282	D283	V308	T311
34	Tba	E300	L302	E303	S304	A329	T332
37	Tsi	D302	L304	D305	T306	T331	S334
38	Mba	L287	N289	S290	E291	P316	E319
39	Mac	L287	N289	S290	E291	P316	E319
40	Mmah	L285	R287	P288	V289	K313	K316
41	Mmaz	I287	N289	S290	E291	P316	E319
42	Mth	R282	S284	G285	E286	E311	R314
43	Mzh	G287	A289	G290	E291	E316	R319
44	Mev	L287	T289	S290	D291	A316	K319
45	Mma	L285	R287	P288	V289	K313	K316
46	Nma	R290	D292	S293	D294	T319	S322
47	Mbo	L280	G282	T283	P284	K309	S312
48	Fac	L270	I272	P273	P274	D299	T302
49	Mfe	L288	P290	P291	T292	Q316	K319
50	Mja	L286	P288	P289	T290	Q314	K317
51	Min	F270	P272	P273	T274	E298	K301
52	Mhu	R280	L282	R283	D284	Q309	T312
53	Afu	L280	E282	N283	E284	G309	R312
54	Htu	R290	D292	S293	D294	T319	S322
55	Hpa	R289	V291	S292	D293	D318	S321
58	Hsp (ski2-	G299	S301	D302	T303	E327	E330
	like
	helicase)

The Hel308 helicase more preferably comprises a variant of one of SEQ ID NOs: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 58 which comprises one or more cysteine residues and/or one or more non-natural amino acids at one or more of the positions which correspond to D274, E284, E285, S288, S615, K717 and Y720 in Hel308 Mbu (SEQ ID NO: 10). The relevant positions are shown in columns A to G in Table 3a above.
The helicase may comprise a cysteine residue at one, two, three, four, five, six or seven of the positions which correspond to D274, E284, E285, S288, 5615, K717 and Y720 in Hel308 Mbu (SEQ ID NO: 10). Any combination of these positions may be substituted with cysteine. For instance, for each row of Table 3a above, the helicase of the invention may comprise a cysteine at any of the following combinations of the positions labelled A to G in that row: {A}, {B}, {C}, {D}, {G}, {E}, {F}, {A and B}, {A and C}, {A and D}, {A and G}, {A and E}, {A and F}, {B and C}, {B and D}, {B and G}, {B and E}, {B and F}, {C and D}, {C and G}, {C and E}, {C and F}, {D and G}, {D and E}, {D and F}, {G and E}, {G and F}, {E and F}, {A, B and C}, {A, B and D}, {A, B and G}, {A, B and E}, {A, B and F}, {A, C and D}, {A, C and G}, {A, C and E}, {A, C and F}, {A, D and G}, {A, D and E}, {A, D and F}, {A, G and E}, {A, G and F}, {A, E and F}, {B, C and D}, {B, C and G}, {B, C and E}, {B, C and F}, {B, D and G}, {B, D and E}, {B, D and F}, {B, G and E}, {B, G and F}, {B, E and F}, {C, D and G}, {C, D and E}, {C, D and F}, {C, G and E}, {C, G and F}, {C, E and F}, {D, G and E}, {D, G and F}, {D, E and F}, {G, E and F}, {A, B, C and D}, {A, B, C and G}, {A, B, C and E}, {A, B, C and F}, {A, B, D and G}, {A, B, D and E}, {A, B, D and F}, {A, B, G and E}, {A, B, G and F}, {A, B, E and F}, {A, C, D and G}, {A, C, D and E}, {A, C, D and F}, {A, C, G and E}, {A, C, G and F}, {A, C, E and F}, {A, D, G and E}, {A, D, G and F}, {A, D, E and F}, {A, G, E and F}, {B, C, D and G}, {B, C, D and E}, {B, C, D and F}, {B, C, G and E}, {B, C, G and F}, {B, C, E and F}, {B, D, G and E}, {B, D, G and F}, {B, D, E and F}, {B, G, E and F}, {C, D, G and E}, {C, D, G and F}, {C, D, E and F}, {C, G, E and F}, {D, G, E and F}, {A, B, C, D and G}, {A, B, C, D and E}, {A, B, C, D and F}, {A, B, C, G and E}, {A, B, C, G and F}, {A, B, C, E and F}, {A, B, D, G and E}, {A, B, D, G and F}, {A, B, D, E and F}, {A, B, G, E and F}, {A, C, D, G and E}, {A, C, D, G and F}, {A, C, D, E and F}, {A, C, G, E and F}, {A, D, G, E and F}, {B, C, D, G and E}, {B, C, D, G and F}, {B, C, D, E and F}, {B, C, G, E and F}, {B, D, G, E and F}, {C, D, G, E and F}, {A, B, C, D, G and E}, {A, B, C, D, G and F}, {A, B, C, D, E and F}, {A, B, C, G, E and F}, {A, B, D, G, E and F}, {A, C, D, G, E and F}, {B, C, D, G, E and F}, or {A, B, C, D, G, E and F}.
The helicase may comprises a non-natural amino acid, such as Faz, at one, two, three, four, five, six or seven of the positions which correspond to D274, E284, E285, 5288, 5615, K717 and Y720 in Hel308 Mbu (SEQ ID NO: 10). Any combination of these positions may be substituted with a non-natural amino acid, such as Faz. For instance, for each row of Table 3a above, the helicase of the invention may comprise a non-natural amino acid, such as Faz, at any of the combinations of the positions labelled A to G above.
The helicase may comprise a combination of one or more cysteines and one or more non-natural amino acids, such as Faz, at two or more of the positions which correspond to D274, E284, E285, S288, 5615, K717 and Y720 in Hel308 Mbu (SEQ ID NO: 10). Any combination of one or more cysteine residues and one or more non-natural amino acids, such as Faz, may be present at the relevant positions. For instance, for each row of Table 3a and 3b above, the helicase of the invention may comprise one or more cysteines and one or more non-natural amino acids, such as Faz, at any of the combinations of the positions labelled A to G above.
The Hel308 helicase more preferably comprises a variant of one of SEQ ID NOs: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 58 which comprises one or more cysteine residues and/or one or more non-natural amino acids at one or more of the positions which correspond to D274, E284, E285, S288 and S615 in Hel308 Mbu (SEQ ID NO: 10). The relevant positions are shown in columns A to E in Table 3a above.
The helicase may comprise a cysteine residue at one, two, three, four or five, six or seven of the positions which correspond to D274, E284, E285, S288, 5615, K717 and Y720 in Hel308 Mbu (SEQ ID NO: 10). Any combination of these positions may be substituted with cysteine. For instance, for each row of Table 3a above, the helicase of the invention may comprise a cysteine at any of the following combinations of the positions labelled A to E in that row: {A}, {B}, {C}, {D}, {E}, {A and B}, {A and C}, {A and D}, {A and E}, {B and C}, {B and D}, {B and E}, {C and D}, {C and E}, {D and E}, {A, B and C}, {A, B and D}, {A, B and E}, {A, C and D}, {A, C and E}, {A, D and E}, {B, C and D}, {B, C and E}, {B, D and E}, {C, D and E}, {A, B, C and D}, {A, B, C and E}, {A, B, D and E}, {A, C, D and E}, {B, C, D and E} or {A, B, C, D and E}.
The helicase may comprises a non-natural amino acid, such as Faz, at one, two, three, four or five of the positions which correspond to D274, E284, E285, 5288, 5615, K717 and Y720 in Hel308 Mbu (SEQ ID NO: 10). Any combination of these positions may be substituted with a non-natural amino acid, such as Faz. For instance, for each row of Table 3a above, the helicase of the invention may comprise a non-natural amino acid, such as Faz, at any of the combinations of the positions labelled A to E above.
The helicase may comprise a combination of one or more cysteines and one or more non-natural amino acids, such as Faz, at two or more of the positions which correspond to D274, E284, E285, S288 and 5615 in Hel308 Mbu (SEQ ID NO: 10). Any combination of one or more cysteine residues and one or more non-natural amino acids, such as Faz, may be present at the relevant positions. For instance, for each row of Table 3a above, the helicase of the invention may comprise one or more cysteines and one or more non-natural amino acids, such as Faz, at any of the combinations of the positions labelled A to E above.
The Hel308 helicase preferably comprises a variant of the sequence of Hel308 Mbu (i.e. SEQ ID NO: 10) which comprises one or more cysteine residues and/or one or more non-natural amino acids at D272, N273, D274, G281, E284, E285, E287, S288, T289, G290, E291, D293, T294, N300, R303, K304, N314, 5315, N316, H317, R318, K319, L320, E322, R326, N328, S615, K717, Y720, N721 and 5724. The variant preferably comprises D272C, N273C, D274C, G281C, E284C, E285C, E287C, S288C, T289C, G290C, E291C, D293C, T294C, N300C, R303C, K304C, N314C, S315C, N316C, H317C, R318C, K319C, L320C, E322C, R326C, N328C, S615C, K717C, Y720C, N721C or S724C. The variant preferably comprises D272Faz, N273Faz, D274Faz, G281Faz, E284Faz, E285Faz, E287Faz, S288Faz, T289Faz, G290Faz, E291Faz, D293Faz, T294Faz, N300Faz, R303Faz, K304Faz, N314Faz, S315Faz, N316Faz, H317 Faz, R318Faz, K319Faz, L320Faz, E322Faz, R326Faz, N328Faz, S615Faz, K717Faz, Y720Faz, N721Faz or S724Faz.
The Hel308 helicase preferably comprises a variant of the sequence of Hel308 Mbu (i.e. SEQ ID NO: 10) which comprises one or more cysteine residues and/or one or more non-natural amino acids at D274, E284, E285, S288, 5615, K717 and Y720. The helicase of the invention may comprise one or more cysteines, one or more non-natural amino acids, such as Faz, or a combination thereof at any of the combinations of the positions labelled A to G above.
The Hel308 helicase preferably comprises a variant of the sequence of Hel308 Mbu (i.e. SEQ ID NO: 10) which comprises one or more cysteine residues and/or one or more non-natural amino acids at one or more of D274, E284, E285, 5288 and 5615. For instance, for Hel308 Mbu (SEQ ID NO: 10), the helicase of the invention may comprise a cysteine or a non-natural amino acid, such as Faz, at any of the following combinations of positions: {D274}, {E284}, {E285}, {S288}, {S615}, {D274 and E284}, {D274 and E285}, {D274 and S288}, {D274 and 5615}, {E284 and E285}, {E284 and S288}, {E284 and 5615}, {E285 and S288}, {E285 and 5615}, {5288 and 5615}, {D274, E284 and E285}, {D274, E284 and S288}, {D274, E284 and 5615}, {D274, E285 and S288}, {D274, E285 and 5615}, {D274, S288 and 5615}, {E284, E285 and S288}, {E284, E285 and S615}, {E284, 5288 and S615}, {E285, 5288 and S615}, {D274, E284, E285 and S288}, {D274, E284, E285 and 5615}, {D274, E284, S288 and 5615}, {D274, E285, S288 and 5615}, {E284, E285, S288 and 5615} or {D274, E284, E285, S288 and 5615}.
The helicase preferably comprises a variant of SEQ ID NO: 10 which comprises (a) E284C and 5615C, (b), E284Faz and S615Faz, (c) E284C and S615Faz or (d) E284Faz and S615C.
The helicase more preferably comprises the sequence shown in SEQ ID NO: 10 with E284C and 5615C.
Preferred non-natural amino acids for use in the invention include, but are not limited, to 4-Azido-L-phenylalanine (Faz), 4-Acetyl-L-phenylalanine, 3-Acetyl-L-phenylalanine, 4-Acetoacetyl-L-phenylalanine, O-Allyl-L-tyrosine, 3-(Phenylselanyl)-L-alanine, O-2-Propyn-1-yl-L-tyrosine, 4-(Dihydroxyboryl)-L-phenylalanine, 4-[(Ethylsulfanyl)carbonyl]-L-phenylalanine, (2S)-2-amino-3-{4-[(propan-2-ylsulfanyl)carbonyl]phenyl}propanoic acid, (2S)-2-amino-3-{4-[(2-amino-3-sulfanylpropanoyl)amino]phenyl}propanoic acid, O-Methyl-L-tyrosine, 4-Amino-L-phenylalanine, 4-Cyano-L-phenylalanine, 3-Cyano-L-phenylalanine, 4-Fluoro-L-phenylalanine, 4-Iodo-L-phenylalanine, 4-Bromo-L-phenylalanine, O-(Trifluoromethyl)tyrosine, 4-Nitro-L-phenylalanine, 3-Hydroxy-L-tyrosine, 3-Amino-L-tyrosine, 3-Iodo-L-tyrosine, 4-Isopropyl-L-phenylalanine, 3-(2-Naphthyl)-L-alanine, 4-Phenyl-L-phenylalanine, (2S)-2-amino-3-(naphthalen-2-ylamino)propanoic acid, 6-(Methylsulfanyl)norleucine, 6-Oxo-L-lysine, D-tyrosine, (2R)-2-Hydroxy-3-(4-hydroxyphenyl)propanoic acid, (2R)-2-Ammoniooctanoate3-(2,2′-Bipyridin-5-yl)-D-alanine, 2-amino-3-(8-hydroxy-3-quinolyl)propanoic acid, 4-Benzoyl-L-phenylalanine, 5-(2-Nitrobenzyl)cysteine, (2R)-2-amino-3[(2-nitrobenzyl)sulfanyl]propanoic acid, (2S)-2-amino-3-[(2-nitrobenzyl)oxy]propanoic acid, O-(4,5-Dimethoxy-2-nitrobenzyl)-L-serine, (2S)-2-amino-6-({[(2-nitrobenzyl)oxy]carbonyl}amino)hexanoic acid, O-(2-Nitrobenzyl)-L-tyrosine, 2-Nitrophenylalanine, 4-[(E)-Phenyldiazenyl]-L-phenylalanine, 4-[3-(Trifluoromethyl)-3H-diaziren-3-yl]-D-phenylalanine, 2-amino-3-[[5-(dimethylamino)-1-naphthyl]sulfonylamino]propanoic acid, (2S)-2-amino-4-(7-hydroxy-2-oxo-2H-chromen-4-yl)butanoic acid, (2S)-3-[(6-acetylnaphthalen-2-yl)amino]-2-aminopropanoic acid, 4-(Carboxymethyl)phenylalanine, 3-Nitro-L-tyrosine, 0-Sulfo-L-tyrosine, (2R)-6-Acetamido-2-ammoniohexanoate, 1-Methylhistidine, 2-Aminononanoic acid, 2-Aminodecanoic acid, L-Homocysteine, 5-Sulfanylnorvaline, 6-Sulfanyl-L-norleucine, 5-(Methylsulfanyl)-L-noryaline, N⁶-{[(2R,3R)-3-Methyl-3,4-dihydro-2H-pyrrol-2-yl]carbonyl}-L-lysine, N⁶-[(Benzyloxy)carbonyl]lysine, (2S)-2-amino-6-[(cyclopentylcarbonyl)amino]hexanoic acid, N⁶-[(Cyclopentyloxy)carbonyl]-L-lysine, (2S)-2-amino-6-({[(2R)-tetrahydrofuran-2-ylcarbonyl]amino}hexanoic acid, (2S)-2-amino-8-[(2R,3S)-3-ethynyltetrahydrofuran-2-yl]-8-oxooctanoic acid, N⁶-(tert-Butoxycarbonyl)-L-lysine, (2S)-2-Hydroxy-6-({[(2-methyl-2-propanyl)oxy]carbonyl}amino)hexanoic acid, N⁶-[(Allyloxy)carbonyl]lysine, (2S)-2-amino-6-({[(2-azidobenzyl)oxy]carbonyl}amino)hexanoic acid, N⁶-L-Prolyl-L-lysine, (2S)-2-amino-6-{[(prop-2-yn-1-yloxy)carbonyl]amino}hexanoic acid and N⁶-[(2-Azidoethoxy)carbonyl]-L-lysine.
The most preferred non-natural amino acid is 4-azido-L-phenylalanine (Faz).
As discussed above, variant of a Hel308 helicase is an enzyme that has an amino acid sequence which varies from that of the wild-type helicase and which retains polynucleotide binding activity. A variant of one of SEQ ID NOs: 10, 13, 16, 19, 22, 25, 28, 29, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 58 may comprise additional modifications as long as it comprises one or more cysteine residues and/or one or more non-natural amino acids at one or more of the positions which correspond to D272, N273, D274, G281, E284, E285, E287, 5288, T289, G290, E291, D293, T294, N300, R303, K304, N314, S315, N316, H317, R318, K319, L320, E322, R326, N328, 5615, K717, Y720, N721 and 5724 in Hel308 Mbu (SEQ ID NO: 10). Suitable modifications and variants are discussed above with reference to the embodiments with two or more parts connected.
A variant may comprise the mutations in domain 5 disclosed in Woodman et al. (J. Mol. Biol. (2007) 374, 1139-1144). These mutations correspond to R685A, R687A and R689A in SEQ ID NO: 10.
The two or more parts may be connected in any way. The connection can be transient, for example non-covalent. Even transient connection will reduce the size of the opening and reduce unbinding of the polynucleotide from the helicase through the opening.
The two or more parts are preferably connected by affinity molecules. Suitable affinity molecules are known in the art. The affinity molecules are preferably (a) complementary polynucleotides (International Application No. PCT/GB10/000132 (published as WO 2010/086602), (b) an antibody or a fragment thereof and the complementary epitope (Biochemistry 6th Ed, W. H. Freeman and co (2007) pp 953-954), (c) peptide zippers (O'Shea et al., Science 254 (5031): 539-544), (d) capable of interacting by β-sheet augmentation (Remaut and Waksman Trends Biochem. Sci. (2006) 31 436-444), (e) capable of hydrogen bonding, pi-stacking or forming a salt bridge, (f) rotaxanes (Xiang Ma and He Tian Chem. Soc. Rev., 2010, 39, 70-80), (g) an aptamer and the complementary protein (James, W. in Encyclopedia of Analytical Chemistry, R. A. Meyers (Ed.) pp. 4848-4871 John Wiley & Sons Ltd, Chichester, 2000) or (h) half-chelators (Hammerstein et al. J Biol Chem. 2011 Apr. 22; 286(16): 14324-14334). For (e), hydrogen bonding occurs between a proton bound to an electronegative atom and another electronegative atom. Pi-stacking requires two aromatic rings that can stack together where the planes of the rings are parallel. Salt bridges are between groups that can delocalize their electrons over several atoms, e. g. between aspartate and arginine.
The two or more parts may be transiently connected by a hexa-his tag or Ni-NTA. The two or more parts may also be modified such that they transiently connect to each other.
The two or more parts are preferably permanently connected. In the context of the invention, a connection is permanent if is not broken while the helicase is used or cannot be broken without intervention on the part of the user, such as using reduction to open —S—S— bonds.
The two or more parts are preferably covalently-attached. The two or more parts may be covalently attached using any method known in the art.
The two or more parts may be covalently attached via their naturally occurring amino acids, such as cysteines, threonines, serines, aspartates, asparagines, glutamates and glutamines.
Naturally occurring amino acids may be modified to facilitate attachment. For instance, the naturally occurring amino acids may be modified by acylation, phosphorylation, glycosylation or farnesylation. Other suitable modifications are known in the art. Modifications to naturally occurring amino acids may be post-translation modifications. The two or more parts may be attached via amino acids that have been introduced into their sequences. Such amino acids are preferably introduced by substitution. The introduced amino acid may be cysteine or a non-natural amino acid that facilitates attachment. Suitable non-natural amino acids include, but are not limited to, 4-azido-L-phenylalanine (Faz), any one of the amino acids numbered 1-71 included in FIG. 1 of Liu C. C. and Schultz P. G., Annu. Rev. Biochem., 2010, 79, 413-444 or any one of the amino acids listed below. The introduced amino acids may be modified as discussed above.
In a preferred embodiment, the two or more parts are connected using linkers. Linker molecules are discussed in more detail below. One suitable method of connection is cysteine linkage. This is discussed in more detail below. The two or more parts are preferably connected using one or more, such as two or three, linkers. The one or more linkers may be designed to reduce the size of, or close, the opening as discussed above. If one or more linkers are being used to close the opening as discussed above, at least a part of the one or more linkers is preferably oriented such that it is not parallel to the polynucleotide when it is bound by the helicase. More preferably, all of the linkers are oriented in this manner. If one or more linkers are being used to close the opening as discussed above, at least a part of the one or more linkers preferably crosses the opening in an orientation that is not parallel to the polynucleotide when it bound by the helicase. More preferably, all of the linkers cross the opening in this manner. In these embodiments, at least a part of the one or more linkers may be perpendicular to the polynucleotide. Such orientations effectively close the opening such that the polynucleotide cannot unbind from the helicase through the opening.
Each linker may have two or more functional ends, such as two, three or four functional ends. Suitable configurations of ends in linkers are well known in the art.
One or more ends of the one or more linkers are preferably covalently attached to the helicase. If one end is covalently attached, the one or more linkers may transiently connect the two or more parts as discussed above. If both or all ends are covalently attached, the one or more linkers permanently connect the two or more parts.
At least one of the two or more parts is preferably modified to facilitate the attachment of the one or more linkers. Any modification may be made. The linkers may be attached to one or more reactive cysteine residues, reactive lysine residues or non-natural amino acids in the two or more parts. The non-natural amino acid may be any of those discussed above. The non-natural amino acid is preferably 4-azido-L-phenylalanine (Faz). At least one amino acid in the two or more parts is preferably substituted with cysteine or a non-natural amino acid, such as Faz.
The one or more linkers are preferably amino acid sequences and/or chemical crosslinkers.
Suitable amino acid linkers, such as peptide linkers, are known in the art. The length, flexibility and hydrophilicity of the amino acid or peptide linker are typically designed such that it reduces the size of the opening, but does not to disturb the functions of the helicase. Preferred flexible peptide linkers are stretches of 2 to 20, such as 4, 6, 8, 10 or 16, serine and/or glycine amino acids. More preferred flexible linkers include (SG)₁, (SG)₂, (SG)₃, (SG)₄, (SG)₅, (SG)₈, (SG)₁₀, (SG)₁₅or (SG)₂₀wherein S is serine and G is glycine. Preferred rigid linkers are stretches of 2 to 30, such as 4, 6, 8, 16 or 24, proline amino acids. More preferred rigid linkers include (P)₁₂wherein P is proline. The amino acid sequence of a linker preferably comprises a polynucleotide binding moiety. Such moieties and the advantages associated with their use are discussed below.
Suitable chemical crosslinkers are well-known in the art. Suitable chemical crosslinkers include, but are not limited to, those including the following functional groups: maleimide, active esters, succinimide, azide, alkyne (such as dibenzocyclooctynol (DIBO or DBCO), difluoro cycloalkynes and linear alkynes), phosphine (such as those used in traceless and non-traceless Staudinger ligations), haloacetyl (such as iodoacetamide), phosgene type reagents, sulfonyl chloride reagents, isothiocyanates, acyl halides, hydrazines, disulphides, vinyl sulfones, aziridines and photoreactive reagents (such as aryl azides, diaziridines).
Reactions between amino acids and functional groups may be spontaneous, such as cysteine/maleimide, or may require external reagents, such as Cu(I) for linking azide and linear alkynes.
Linkers can comprise any molecule that stretches across the distance required. Linkers can vary in length from one carbon (phosgene-type linkers) to many Angstroms. Examples of linear molecules, include but are not limited to, are polyethyleneglycols (PEGs), polypeptides, polysaccharides, deoxyribonucleic acid (DNA), peptide nucleic acid (PNA), threose nucleic acid (TNA), glycerol nucleic acid (GNA), saturated and unsaturated hydrocarbons, polyamides. These linkers may be inert or reactive, in particular they may be chemically cleavable at a defined position, or may be themselves modified with a fluorophore or ligand. The linker is preferably resistant to dithiothreitol (DTT).
Preferred crosslinkers include 2,5-dioxopyrrolidin-1-yl 3-(pyridin-2-yldisulfanyl)propanoate, 2,5-dioxopyrrolidin-1-yl 4-(pyridin-2-yldisulfanyl)butanoate and 2,5-dioxopyrrolidin-1-yl 8-(pyridin-2-yldisulfanyl)octananoate, di-maleimide PEG 1k, di-maleimide PEG 3.4k, di-maleimide PEG 5k, di-maleimide PEG 10k, bis(maleimido)ethane (BMOE), bis-maleimidohexane (BMH), 1,4-bis-maleimidobutane (BMB), 1,4 bis-maleimidyl-2,3-dihydroxybutane (BMDB), BM[PEO]2 (1,8-bis-maleimidodiethyleneglycol), BM[PEO]3 (1,11-bis-maleimidotriethylene glycol), tris[2-maleimidoethyl]amine (TMEA), DTME dithiobismaleimidoethane, bis-maleimide PEGS, bis-maleimide PEG11, DBCO-maleimide, DBCO-PEG4-maleimide, DBCO-PEG4-NH2, DBCO-PEG4-NHS, DBCO-NHS, DBCO-PEG-DBCO 2.8 kDa, DBCO-PEG-DBCO 4.0 kDa, DBCO-15 atoms-DBCO, DBCO-26 atoms-DBCO, DBCO-35 atoms-DBCO, DBCO-PEG4-S-S-PEG3-biotin, DBCO-S-S-PEG3-biotin, DBCO-S-S-PEG11-biotin, (succinimidyl 3-(2-pyridyldithio)propionate (SPDP) and maleimide-PEG (2 kDa)-maleimide (ALPHA,OMEGA-BIS-MALEIMIDO POLY (ETHYLENE GLYCOL)). The most preferred crosslinker is maleimide-propyl-SRDFWRS-(1,2-diaminoethane)-propyl-maleimide as used in the Examples.
The one or more linkers may be cleavable. This is discussed in more detail below.
The two or more parts may be connected using two different linkers that are specific for each other. One of the linkers is attached to one part and the other is attached to another part. The linkers should react to form a modified helicase of the invention. The two or more parts may be connected using the hybridization linkers described in International Application No. PCT/GB10/000132 (published as WO 2010/086602). In particular, the two or more parts may be connected using two or more linkers each comprising a hybridizable region and a group capable of forming a covalent bond. The hybridizable regions in the linkers hybridize and link the two or more parts. The linked parts are then coupled via the formation of covalent bonds between the groups. Any of the specific linkers disclosed in International Application No. PCT/GB10/000132 (published as WO 2010/086602) may be used in accordance with the invention.
The two or more parts may be modified and then attached using a chemical crosslinker that is specific for the two modifications. Any of the crosslinkers discussed above may be used.
The linkers may be labeled. Suitable labels include, but are not limited to, fluorescent molecules (such as Cy3 or AlexaFluor®555), radioisotopes, e.g. ¹²⁵I, ³⁵S, enzymes, antibodies, antigens, polynucleotides and ligands such as biotin. Such labels allow the amount of linker to be quantified. The label could also be a cleavable purification tag, such as biotin, or a specific sequence to show up in an identification method, such as a peptide that is not present in the protein itself, but that is released by trypsin digestion.
A preferred method of connecting the two or more parts is via cysteine linkage. This can be mediated by a bi-functional chemical crosslinker or by an amino acid linker with a terminal presented cysteine residue. Linkage can occur via natural cysteines in the helicase. Alternatively, cysteines can be introduced into the two or more parts of the helicase. If the two or more parts are connected via cysteine linkage, the one or more cysteines have preferably been introduced to the two or more parts by substitution.
The length, reactivity, specificity, rigidity and solubility of any bi-functional linker may be designed to ensure that the size of the opening is reduced sufficiently and the function of the helicase is retained. Suitable linkers include bismaleimide crosslinkers, such as 1,4-bis(maleimido)butane (BMB) or bis(maleimido)hexane. One draw back of bi-functional linkers is the requirement of the helicase to contain no further surface accessible cysteine residues if attachment at specific sites is preferred, as binding of the bi-functional linker to surface accessible cysteine residues may be difficult to control and may affect substrate binding or activity. If the helicase does contain several accessible cysteine residues, modification of the helicase may be required to remove them while ensuring the modifications do not affect the folding or activity of the helicase. This is discussed in International Application No. PCT/GB10/000133 (published as WO 2010/086603). The reactivity of cysteine residues may be enhanced by modification of the adjacent residues, for example on a peptide linker. For instance, the basic groups of flanking arginine, histidine or lysine residues will change the pKa of the cysteines thiol group to that of the more reactive 5-group. The reactivity of cysteine residues may be protected by thiol protective groups such as 5,5′-dithiobis-(2-nitrobenzoic acid) (dTNB). These may be reacted with one or more cysteine residues of the helicase before a linker is attached. Selective deprotection of surface accessible cysteines may be possible using reducing reagents immobilized on beads (for example immobilized tris(2-carboxyethyl) phosphine, TCEP). Cysteine linkage of the two or more parts is discussed in more detail below.
Another preferred method of attaching the two or more parts is via 4-azido-L-phenylalanine (Faz) linkage. This can be mediated by a bi-functional chemical linker or by a polypeptide linker with a terminal presented Faz residue. The one or more Faz residues have preferably been introduced to the helicase by substitution. Faz linkage of two or more helicases is discussed in more detail below.
The helicase is preferably a RecD helicase. Any RecD helicase may be used in accordance with the invention. The structures of RecD helicases are known in the art (FEBS J. 2008 April; 275(8):1835-51. Epub 2008 Mar. 9. ATPase activity of RecD is essential for growth of the Antarctic Pseudomonas syringae Lz4W at low temperature. Satapathy A K, Pavankumar T L, Bhattacharjya S, Sankaranarayanan R, Ray MK; EMS Microbiol Rev. 2009 May; 33(3):657-87. The diversity of conjugative relaxases and its application in plasmid classification. Garcillán-Barcia M P, Francia M V, de la Cruz F; J Biol Chem. 2011 Apr. 8; 286(14):12670-82. Epub 2011 Feb. 2. Functional characterization of the multidomain F plasmid TraI relaxase-helicase. Cheng Y, McNamara D E, Miley M J, Nash R P, Redinbo M R).
The RecD helicase typically comprises the amino acid motif X1-X2-X3-G-X4-X5-X6-X7 (hereinafter called the RecD-like motif I; SEQ ID NO: 59), wherein X1 is G, S or A, X2 is any amino acid, X3 is P, A, S or G, X4 is T, A, V, S or C, X5 is G or A, X6 is K or R and X7 is T or S. X1 is preferably G. X2 is preferably G, I, Y or A. X2 is more preferably G. X3 is preferably P or A. X4 is preferably T, A, V or C. X4 is preferably T, V or C. X5 is preferably G. X6 is preferably K. X7 is preferably T or S. The RecD helicase preferably comprises Q-(X8)_16-18-X1-X2-X3-G-X4-X5-X6-X7 (hereinafter called the extended RecD-like motif I; SEQ ID NOs: 60, 61 and 62), wherein X1 to X7 are as defined above and X8 is any amino acid. There are preferably 16 X8 residues (i.e. (X8)₁₆) in the extended RecD-like motif I (SEQ ID NO: 60). Suitable sequences for (X8)₁₆can be identified in SEQ ID NOs: 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47 and 50 of U.S. Patent Application No. 61/581,332 and SEQ ID NOs: 18, 21, 24, 25, 28, 30, 32, 35, 37, 39, 41, 42 and 44 of International Application No. PCT/GB2012/053274 (published as WO 2012/098562).
The RecD helicase preferably comprises the amino acid motif G-G-P-G-Xa-G-K-Xb (hereinafter called the RecD motif I; SEQ ID NO: 63) wherein Xa is T, V or C and Xb is T or S. Xa is preferably T. Xb is preferably T. The Rec-D helicase preferably comprises the sequence G-G P G T G K T (SEQ ID NO: 64). The RecD helicase more preferably comprises the amino acid motif Q-(X8)_16-18-G-G-P-G-Xa-G-K-Xb (hereinafter called the extended RecD motif I; SEQ ID NO: 65, 66 and 67), wherein Xa and Xb are as defined above and X8 is any amino acid. There are preferably 16 X8 residues (i.e. (X8)₁₆) in the extended RecD motif I (SEQ ID NO: 65). Suitable sequences for (X8)₁₆can be identified in SEQ ID NOs: 14, 17, 20, 23, 26, 29, 32, 35, 38, 41, 44, 47 and 50 of U.S. Patent Application No. 61/581,332 and SEQ ID NOs: 18, 21, 24, 25, 28, 30, 32, 35, 37, 39, 41, 42 and 44 of International Application No. PCT/GB2012/053274 (published as WO 2012/098562).
The RecD helicase typically comprises the amino acid motif X1-X2-X3-X4-X5-(X6)₃-Q-X7 (hereinafter called the RecD-like motif V; SEQ ID NO: 68), wherein X1 is Y, W or F, X2 is A, T, S, M, C or V, X3 is any amino acid, X4 is T, N or S, X5 is A, T, G, S, V or I, X6 is any amino acid and X7 is G or S. X1 is preferably Y. X2 is preferably A, M, C or V. X2 is more preferably A. X3 is preferably I, M or L. X3 is more preferably I or L. X4 is preferably T or S. X4 is more preferably T. X5 is preferably A, V or I. X5 is more preferably V or I. X5 is most preferably V. (X6)₃is preferably H-K-S, H-M-A, H-G-A or H-R-S. (X6)₃is more preferably H-K-S. X7 is preferably G. The RecD helicase preferably comprises the amino acid motif Xa-Xb-Xc-Xd-Xe-H-K-S-Q-G (hereinafter called the RecD motif V; SEQ ID NO: 69), wherein Xa is Y, W or F, Xb is A, M, C or V, Xc is I, M or L, Xd is T or S and Xe is V or I. Xa is preferably Y. Xb is preferably A. Xd is preferably T. Xd is preferably V. Preferred RecD motifs I are shown in Table 5 of U.S. Patent Application No. 61/581,332. Preferred RecD-like motifs I are shown in Table 7 of U.S. Patent Application No. 61/581,332 and International Application No. PCT/GB2012/053274 (published as WO 2012/098562). Preferred RecD-like motifs V are shown in Tables 5 and 7 of U.S. Patent Application No. 61/581,332 and International Application No. PCT/GB2012/053274 (published as WO 2012/098562).
The RecD helicase is preferably one of the helicases shown in Table 4 or 5 of U.S. Patent Application No. 61/581,332 and International Application No. PCT/GB2012/053274 (published as WO 2012/098562) or a variant thereof. Variants are described in U.S. Patent Application No. 61/581,332 and International Application No. PCT/GB2012/053274 (published as WO 2012/098562).
The RecD helicase is preferably a TraI helicase or a TraI subgroup helicase. TraI helicases and TraI subgroup helicases may contain two RecD helicase domains, a relaxase domain and a C-terminal domain. The TraI subgroup helicase is preferably a TrwC helicase. The TraI helicase or TraI subgroup helicase is preferably one of the helicases shown in Table 6 of U.S. Patent Application No. 61/581,332 and International Application No. PCT/GB2012/053274 (published as WO 2012/098562) or a variant thereof. Variants are described in U.S. Patent Application No. 61/581,332 and International Application No. PCT/GB2012/053274 (published as WO 2012/098562).
The TraI helicase or a TraI subgroup helicase typically comprises a RecD-like motif I as defined above (SEQ ID NO: 59) and/or a RecD-like motif V as defined above (SEQ ID NO: 68). The TraI helicase or a TraI subgroup helicase preferably comprises both a RecD-like motif I (SEQ ID NO: 59) and a RecD-like motif V (SEQ ID NO: 68). The TraI helicase or a TraI subgroup helicase typically further comprises one of the following two motifs:

- The amino acid motif H-(X1)₂-X2-R-(X3)_5-12-H-X4-H (hereinafter called the MobF motif III; SEQ ID NOs: 70 to 77), wherein X1 and X2 are any amino acid and X2 and X4 are independently selected from any amino acid except D, E, K and R. (X1)₂is of course X1a-X1b. X1a and X1b can be the same of different amino acid. X1a is preferably D or E. X1b is preferably T or D. (X1)₂is preferably DT or ED. (X1)₂is most preferably DT. The 5 to 12 amino acids in (X3)_5-12can be the same or different. X2 and X4 are independently selected from G, P, A, V, L, I, M, C, F, Y, W, H, Q, N, S and T. X2 and X4 are preferably not charged. X2 and X4 are preferably not H. X2 is more preferably N, S or A. X2 is most preferably N. X4 is most preferably F or T. (X3)_5-12is preferably 6 or 10 residues in length. Suitable embodiments of (X3)_5-12can be derived from SEQ ID NOs: 58, 62, 66 and 70 shown in Table 7 of U.S. Patent Application No. 61/581,332 and SEQ ID NOs: 61, 65, 69, 73, 74, 82, 86, 90, 94, 98, 102, 110, 112, 113, 114, 117, 121, 124, 125, 129, 133, 136, 140, 144, 147, 151, 152, 156, 160, 164 and 168 of International Application No. PCT/GB2012/053274 (published as WO 2012/098562).
- The amino acid motif G X1-X2-X3-X4-X5-X6-X7H-(X8)_6-12-H-X9 (hereinafter called the MobQ motif III; SEQ ID NOs: 78 to 84), wherein X1, X2, X3, X5, X6, X7 and X9 are independently selected from any amino acid except D, E, K and R, X4 is D or E and X8 is any amino acid. X1, X2, X3, X5, X6, X7 and X9 are independently selected from G, P, A, V, L, I, M, C, F, Y, W, H, Q, N, S and T. X1, X2, X3, X5, X6, X7 and X9 are preferably not charged. X1, X2, X3, X5, X6, X7 and X9 are preferably not H. The 6 to 12 amino acids in (X8)_6-12can be the same or different. Preferred MobF motifs III are shown in Table 7 of U.S. Patent Application No. 61/581,332 and International Application No. PCT/GB2012/053274 (published as WO 2012/098562).

The TraI helicase or TraI subgroup helicase is more preferably one of the helicases shown in Table 6 or 7 of U.S. Patent Application No. 61/581,332 and International Application No. PCT/GB2012/053274 (published as WO 2012/098562) or a variant thereof. The TraI helicase most preferably comprises the sequence shown in SEQ ID NO: 85 or a variant thereof. SEQ ID NO: 85 is TraI Eco (NCBI Reference Sequence: NP_061483.1; Genbank AAQ98619.1; SEQ ID NO: 85). TraI Eco comprises the following motifs: RecD-like motif I (GYAGVGKT; SEQ ID NO: 86), RecD-like motif V (YAITAHGAQG; SEQ ID NO: 87) and Mob F motif III (HDTSRDQEPQLHTH; SEQ ID NO: 88).
The TraI helicase or TraI subgroup helicase more preferably comprises the sequence of one of the helicases shown in Table 4 below, i.e. one of SEQ ID NOs: 85, 126, 134 and 138, or a variant thereof.

TABLE 4

More preferred TraI helicase and TraI subgroup helicases

				%	RecD-	RecD-	Mob F
				Identity	like	like	motif
SEQ				to	motif I	motif V	III
ID				TraI	(SEQ ID	(SEQ ID	(SEQ ID
NO	Name	Strain	NCBI ref	Eco	NO:)	NO:)	NO:)

85	TraI	Escherichia	NCBI	—	GYAGV	YAITA	HDTSR
	Eco	coli	Reference		GKT	HGAQG	DQEPQ
			Sequence:		(86)	(87)	LHTH
			NP_061483.1				88)
			Genbank
			AAQ98619.1

126	TrwC	Citromicrobium	NCBI	15%	GIAGA	YALNV	HDTNR
	Cba	bathyomarinum	Reference		GKS	HMAQG	NQEPN
		JL354	Sequence:		(131)	(132)	LHFH
			ZP_06861556.1				(133)

134	TrwC	Halothiobacillus	NCBI	11.5%	GAAGA	YCITIH	HEDAR
	Hne	neapolitanus	Reference		GKT	RSQG	TVDDI
		c2	Sequence:		(135)	(136)	ADPQL
			YP_003262832.1				HTH
							(137)

138	TrwC	Erythrobacter	NCBI	16%	GIAGA	YALNA	HDTNR
	Eli	litoralis	Reference		GKS	HMAQG	NQEPN
		HTCC2594	Sequence:		(131)	(139)	LHFH
			YP_457045.1				(133)

As discussed above for Hel308 helicases, two or more parts on the RecD helicase, TraI helicase or TraI subgroup helicase may be connected to reduce the size of the opening in the polynucleotide domain through which a polynucleotide can unbind from the helicase and wherein the helicase retains its ability to control the movement of the polynucleotide. Any of the embodiments discussed above for Hel308 helicases equally apply to RecD helicases, TraI helicases or TraI subgroup helicases. The two or more parts of TrwC Cba that are connected are preferably (a) amino acids 691 and 346 in SEQ ID NO: 126; (b) amino acids 657 and 339 in SEQ ID NO: 126; (c) amino acids 691 and 350 in SEQ ID NO: 126; or (d) amino acids 690 and 350 in SEQ ID NO: 126. These amino acids are preferably substituted with cysteine such that they can be connected by cysteine linkage.
The invention may use a mutant TrwC Cba protein which comprises a variant of SEQ ID NO: 126 in which amino acids 691 and 346; 657 and 339; 691 and 350; or 690 and 350 are modified. The amino acids are preferably substituted. The amino acids are more preferably substituted with cysteine. The variant may differ from SEQ ID NO: 126 at positions other than 691 and 346; 657 and 339; 691 and 350; or 690 and 350 as long as the relevant amino acids are modified. The variant will preferably be at least 10% homologous to SEQ ID NO: 126 based on amino acid identity as discussed in more detail below. Amino acid 691 and 346; 657 and 339; 691 and 350; or 690 and 350 are not connected. These mutant TrwC Cba proteins may be used to form a modified helicase in which the modified amino acids are connected.
A variant of a RecD helicase, TraI helicase or TraI subgroup helicase is an enzyme that has an amino acid sequence which varies from that of the wild-type helicase and which retains polynucleotide binding activity. This can be measured as described above. In particular, a variant of SEQ ID NO: 85, 126, 134 or 138 is an enzyme that has an amino acid sequence which varies from that of SEQ ID NO: 85, 126, 134 or 138 and which retains polynucleotide binding activity. The variant retains helicase activity. The variant must work in at least one of the two modes discussed below. Preferably, the variant works in both modes. The variant may include modifications that facilitate handling of the polynucleotide encoding the helicase and/or facilitate its activity at high salt concentrations and/or room temperature. Variants typically differ from the wild-type helicase in regions outside of the motifs discussed above. However, variants may include modifications within these motif(s).
Over the entire length of the amino acid sequence of any one of SEQ ID NO: 85, 126, 134 and 138, a variant will preferably be at least 10% homologous to that sequence based on amino acid identity. More preferably, the variant polypeptide may be at least 20%, at least 25%, at least 30%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of any one of SEQ ID NOs: 85, 126, 134 and 138 over the entire sequence. There may be at least 70%, for example at least 80%, at least 85%, at least 90% or at least 95%, amino acid identity over a stretch of 150 or more, for example 200, 300, 400, 500, 600, 700, 800, 900 or 1000 or more, contiguous amino acids (“hard homology”). Homology is determined as described below. The variant may differ from the wild-type sequence in any of the ways discussed above with reference to SEQ ID NOs: 2 and 4.
A variant of any one of SEQ ID NOs: 85, 126, 134 and 138 preferably comprises the RecD-like motif I and/or RecD-like motif V of the wild-type sequence. However, a variant of SEQ ID NO: 85, 126, 134 or 138 may comprise the RecD-like motif I and/or extended RecD-like motif V from a different wild-type sequence. For instance, a variant may comprise any one of the preferred motifs shown in Tables 5 and 7 of U.S. Patent Application No. 61/581,332 and International Application No. PCT/GB2012/053274 (published as WO 2012/098562). Variants of SEQ ID NOs: 85, 126, 134 and 138 may also include modifications within the RecD-like motifs I and V of the wild-type sequence. A variant of SEQ ID NO: 85, 126, 134 or 138 preferably comprises one or more substituted cysteine residues and/or one or more substituted Faz residues to facilitate attachment as discussed above.
The helicase is preferably an XPD helicase. Any XPD helicase may be used in accordance with the invention. XPD helicases are also known as Rad3 helicases and the two terms can be used interchangeably.
The structures of XPD helicases are known in the art (Cell. 2008 May 30; 133(5):801-12. Structure of the DNA repair helicase XPD. Liu H, Rudolf J, Johnson K A, McMahon S A, Oke M, Carter L, McRobbie A M, Brown S E, Naismith J H, White M F). The XPD helicase typically comprises the amino acid motif X1-X2-X3-G-X4-X5-X6-E-G (hereinafter called XPD motif V; SEQ ID NO: 89). X1, X2, X5 and X6 are independently selected from any amino acid except D, E, K and R. X1, X2, X5 and X6 are independently selected from G, P, A, V, L, I, M, C, F, Y, W, H, Q, N, S and T. X1, X2, X5 and X6 are preferably not charged. X1, X2, X5 and X6 are preferably not H. X1 is more preferably V, L, I, S or Y. X5 is more preferably V, L, I, N or F. X6 is more preferably S or A. X3 and X4 may be any amino acid residue. X4 is preferably K, R or T.
The XPD helicase typically comprises the amino acid motif Q-Xa-Xb-G-R-Xc-Xd-R-(Xe)₃-Xf-(Xg)₇-D-Xh-R (hereinafter called XPD motif VI; SEQ ID NO: 90). Xa, Xe and Xg may be any amino acid residue. Xb, Xc and Xd are independently selected from any amino acid except D, E, K and R. Xb, Xc and Xd are typically independently selected from G, P, A, V, L, I, M, C, F, Y, W, H, Q, N, S and T. Xb, Xc and Xd are preferably not charged. Xb, Xc and Xd are preferably not H. Xb is more preferably V, A, L, I or M. Xc is more preferably V, A, L, I, M or C. Xd is more preferably I, H, L, F, M or V. Xf may be D or E. (Xg)₇is X_g1, X_g2, X_g3, X_g4, X_g5, X_g6and X_g7. X_g2is preferably G, A, S or C. X_g5is preferably F, V, L, I, M, A, W or Y. X_g6is preferably L, F, Y, M, I or V. X_g7is preferably A, C, V, L, I, M or S.
The XPD helicase preferably comprises XPD motifs V and VI. The most preferred XPD motifs V and VI are shown in Table 5 of U.S. Patent Application No. 61/581,340 and International Application No. PCT/GB2012/053273 (published as WO 2012/098561).
The XPD helicase preferably further comprises an iron sulphide (FeS) core between two Walker A and B motifs (motifs I and II). An FeS core typically comprises an iron atom coordinated between the sulphide groups of cysteine residues. The FeS core is typically tetrahedral.
The XPD helicase is preferably one of the helicases shown in Table 4 or 5 of International Application No. PCT/GB2012/053273 (published as WO 2012/098561) or a variant thereof. The XPD helicase most preferably comprises the sequence shown in SEQ ID NO: 91 or a variant thereof. SEQ ID NO: 91 is XPD Mbu (Methanococcoides burtonii; YP_566221.1; GI:91773529). XPD Mbu comprises YLWGTLSEG (Motif V; SEQ ID NO: 92) and QAMGRVVRSPTDYGARILLDGR (Motif VI; SEQ ID NO: 93).
A variant of a XPD helicase is an enzyme that has an amino acid sequence which varies from that of the wild-type helicase and which retains polynucleotide binding activity. This can be measured as described above. In particular, a variant of SEQ ID NO: 91 is an enzyme that has an amino acid sequence which varies from that of SEQ ID NO: 91 and which retains polynucleotide binding activity. The variant retains helicase activity. The variant must work in at least one of the two modes discussed below. Preferably, the variant works in both modes. The variant may include modifications that facilitate handling of the polynucleotide encoding the helicase and/or facilitate its activity at high salt concentrations and/or room temperature. Variants typically differ from the wild-type helicase in regions outside of XPD motifs V and VI discussed above. However, variants may include modifications within one or both of these motifs.
Over the entire length of the amino acid sequence of SEQ ID NO: 91, such as SEQ ID NO: 10, a variant will preferably be at least 10%, preferably 30% homologous to that sequence based on amino acid identity. More preferably, the variant polypeptide may be at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of SEQ ID NO: 91 over the entire sequence. There may be at least 70%, for example at least 80%, at least 85%, at least 90% or at least 95%, amino acid identity over a stretch of 150 or more, for example 200, 300, 400, 500, 600, 700, 800, 900 or 1000 or more, contiguous amino acids (“hard homology”). Homology is determined as described below. The variant may differ from the wild-type sequence in any of the ways discussed above with reference to SEQ ID NOs: 2 and 4.
A variant of SEQ ID NO: 91 preferably comprises the XPD motif V and/or the XPD motif VI of the wild-type sequence. A variant of SEQ ID NO: 91 more preferably comprises both XPD motifs V and VI of SEQ ID NO: 91. However, a variant of SEQ ID NO: 91 may comprise XPD motifs V and/or VI from a different wild-type sequence. For instance, a variant of SEQ ID NO: 91 may comprise any one of the preferred motifs shown in Table 5 of U.S. Patent Application No. 61/581,340 and International Application No. PCT/GB2012/053273 (published as WO 2012/098561). Variants of SEQ ID NO: 91 may also include modifications within XPD motif V and/or XPD motif VI of the wild-type sequence. Suitable modifications to these motifs are discussed above when defining the two motifs. As discussed above for Hel308 helicases, two or more parts on the XPD helicase may be connected to reduce the size of the opening in the polynucleotide domain through which a polynucleotide can unbind from the helicase and wherein the helicase retains its ability to control the movement of the polynucleotide. Any of the embodiments discussed above for Hel308 helicases equally apply to XPD helicases. A variant of SEQ ID NO: 91 preferably comprises one or more substituted cysteine residues and/or one or more substituted Faz residues to facilitate attachment as discussed above.
The helicase is preferably a UvrD helicase. Any UvrD helicase may be used in the invention. The UvrD helicase preferably comprises the sequence shown in SEQ ID NO: 122 or a variant thereof. Variants are defined above. Over the entire length of the amino acid sequence of any one of SEQ ID NO: 122, a variant will preferably be at least 20% homologous to that sequence based on amino acid similarity or identity. More preferably, the variant polypeptide may be at least 30%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of any one of SEQ ID Ns: 122 over the entire sequence. There may be at least 70%, for example at least 80%, at least 85%, at least 90% or at least 95%, amino acid identity over a stretch of 100 or more, for example 150, 200, 300, 400 or 500 or more, contiguous amino acids (“hard homology”). Homology or similarity is determined as described below.
The helicase is preferably a Dda helicase. Any Dda helicase may be used in the invention. Dda helicases typically comprises the following five domains: 1A (RecA-like motor) domain, 2A (RecA-like motor) domain, tower domain, pin domain and hook domain (Xiaoping He et al., 2012, Structure; 20: 1189-1200). The domains may be identified using protein modelling, x-ray diffraction measurement of the protein in a crystalline state (Rupp B (2009). Biomolecular Crystallography: Principles, Practice and Application to Structural Biology. New York: Garland Science), nuclear magnetic resonance (NMR) spectroscopy of the protein in solution (Mark Rance; Cavanagh, John; Wayne J. Fairbrother; Arthur W. Hunt III; Skelton, NNicholas J. (2007). Protein NMR spectroscopy: principles and practice (2nd ed.). Boston: Academic Press.) or cryo-electron microscopy of the protein in a frozen-hydrated state (van Heel M, Gowen B, Matadeen R, Orlova E V, Finn R, Pape T, Cohen D, Stark H, Schmidt R, Schatz M, Patwardhan A (2000). “Single-particle electron cryo-microscopy: towards atomic resolution.”. Q Rev Biophys. 33: 307-69). Structural information of proteins determined by above mentioned methods are publicly available from the protein bank (PDB) database.
Preferred Dda helicases are shown in Table 5 below.


	Number

		of D/E
	Sequence	vs. K/R

Dda Homologue				Identity	amino	#
(SEQ ID NO:)	Habitat	Uniprot	Length	to 1993/%	acids	C

Rma-	Rhodothermus	Mild	D0MKQ2	678	21	−84/+85	2
DSM	marinus	halophile,
(SEQ ID		moderate
NO: 98)		thermophile >65° C.
Csp	Cyanothece sp.	Marine	B1X365	496	24	−76/+76	5
(SEQ ID	(strain ATCC	bacterium
NO: 99)	51142)
Sru	Salinibacter	Extremely	Q2S429	421	26	−78/+54	3
(SEQ ID	ruber	halophilic,
NO: 100)		35-45° C.
Sgo	Sulfurimonas	Habitat:	B6BJ43	500	27	−72/+64	2
(SEQ ID	gotlandica GD1	hydrothermal
NO: 101)		vents,
		coastal
		sediments
Vph12B8	Vibrio phage	Host found	M4MBC3	450	27	−62/+47	6
(SEQ ID	henriette 12B8	in saltwater,
NO: 102)		stomach bug
Vph	Vibrio phage	Host found	I6XGX8	421	39	−55/+45	5
(SEQ ID	phi-pp2	in saltwater,
NO: 103)		stomach bug
Aph65	Aeromonas	Host found	E5DRP6	434	40	−57/+48	4
(SEQ ID	phage 65	in
NO: 104)		fresh/brackish
		water,
		stomach bug
AphCC2	Aeromonas	Host found	I6XH64	420	41	−53/+44	4
(SEQ ID	phage CC2	in
NO: 105)		fresh/brackish
		water,
		stomach bug
Cph	Cronobacter	Host	K4FBD0	443	42	−59/+57	4
(SEQ ID	phage vB CsaM	member of
NO: 106)	GAP161	enterobacteriaceae
Kph	Klebsiella	Host	D5JF67	442	44	−59/+58	5
(SEQ ID	phage KP15	member of
NO: 107)		enterobacteriaceae
SphlME13	Stenotrophomonas	Host found	J7HXT5	438	51	−58/+59	7
(SEQ ID	phage	in soil
NO: 108)	IME13
AphAc42	Acinetobacter	Host found	E5EYE6	442	59	−53/+49	9
(SEQ ID	phage Ac42	in soil
NO: 109)
SphSP18	Shigella phage	Host	E3SFA5	442	59	−55/+55	9
(SEQ ID	SP18	member of
NO: 110)		enterobacteriaceae
Yph	Yersinia phage	Host	I7J3V8	439	64	−52/+52	7
(SEQ ID	phiR1-RT	member of
NO: 111)		enterobacteriaceae
SphS16	Salmonella	Host	M1EA88	441	72	−56/+55	5
(SEQ ID	phage S16	member of
NO: 112)		enterobacteriaceae
1993	Enterobateria	Host	P32270	439	100	−57/+58	5
(SEQ ID	phage T4	member of
NO: 97)		enterobacteriaceae

The Dda helicase more preferably comprises the sequence of one of the helicases shown in the Table 5 above, i.e. one of SEQ ID NOs: 97 to 112, or a variant thereof. Variants are defined above. Over the entire length of the amino acid sequence of any one of SEQ ID NOs: 97 to 112, a variant will preferably be at least 20% homologous to that sequence based on amino acid similarity or identity. More preferably, the variant polypeptide may be at least 30%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of any one of SEQ ID NOs: 97 to 112 over the entire sequence. There may be at least 70%, for example at least 80%, at least 85%, at least 90% or at least 95%, amino acid identity over a stretch of 100 or more, for example 150, 200, 300, 400 or 500 or more, contiguous amino acids (“hard homology”). Homology or similarity is determined as described below.
Preferred variants of any one of SEQ ID NOs: 97 to 112 have a non-natural amino acid, such as Faz, at the amino- (N-) terminus and/or carboxy (C-) terminus. Preferred variants of any one of SEQ ID NOs: 8 to 23 have a cysteine residue at the amino- (N-) terminus and/or carboxy (C-) terminus. Preferred variants of any one of SEQ ID NOs: 8 to 23 have a cysteine residue at the amino- (N-) terminus and a non-natural amino acid, such as Faz, at the carboxy (C-) terminus or vice versa. Preferred variants of SEQ ID NO: 8 contain one or more of, such as all of, the following modifications E54G, D151E, I196N and G357A.
The Dda helicase preferably comprises any of the modifications disclosed in International Application Nos. PCT/GB2014/052736 and PCT/GB2015/052916 (published as WO/2015/055981 and WO 2016/055777).
A preferred variant of SEQ ID NO: 97 comprises (a) E94C and A360C or (b) E94C, A360C, C109A and C136A and then optionally (ΔM1)G1 (i.e. deletion of M1 and then addition G1). It may also be termed M1G. Any of the variants discussed above may further comprise M1G.
As discussed above for Hel308 helicases, two or more parts on the Dda helicase may be connected to reduce the size of the opening in the polynucleotide domain through which a polynucleotide can unbind from the helicase and wherein the helicase retains its ability to control the movement of the polynucleotide. Any of the embodiments discussed above for Hel308 helicases equally apply to Dda helicases.
The translocase is preferably a strippase. The strippase is preferably the INO80 chromatin remodeling complex or a FtsK/SpoIIIE transporter.
In one embodiment, the translocase is contacted with the constructs after they are created by the MuA transposase. In another embodiment, the translocase is bound to the substrates before the substrates are contacted with the template polynucleotide.

Hairpin Loops

After fragmentation of the template polynucleotide and ligation of the MuA substrate to the fragments of the template polynucleotide (tagmentation), constructs comprising a fragment of the template polynucletide and one or more MuA substrates are formed. The two strands of each construct are preferably linked at one end by a hairpin loop. In this embodiment, a hairpin loop is added to each of the fragments of the template polynucleotide generated by the MuA transposase. Suitable hairpin loops can be designed using methods known in the art. The hairpin loop may be any length. The hairpin loop is typically 110 or fewer nucleotides, such as 100 or fewer nucleotides, 90 or fewer nucleotides, 80 or fewer nucleotides, 70 or fewer nucleotides, 60 or fewer nucleotides, 50 or fewer nucleotides, 40 or fewer nucleotides, 30 or fewer nucleotides, 20 or fewer nucleotides or 10 or fewer nucleotides, in length. The hairpin loop is preferably from about 1 to 110, from 2 to 100, from 5 to 80 or from 6 to 50 nucleotides in length. Longer lengths of the hairpin loop, such as from 50 to 110 nucleotides, are preferred if the loop is involved in the differential selectability of the adaptor. Similarly, shorter lengths of the hairpin loop, such as from 1 to 5 nucleotides, are preferred if the loop is not involved in the selectable binding as discussed below.
The hairpin loop preferably comprises a selectable binding moiety. This allows the constructs to be purified or isolated. A selectable binding moiety is a moiety that can be selected on the basis of its binding properties. Hence, a selectable binding moiety is preferably a moiety that specifically binds to a surface. A selectable binding moiety specifically binds to a surface if it binds to the surface to a much greater degree than any other moiety used in the invention. In preferred embodiments, the moiety binds to a surface to which no other moiety used in the invention binds.
Suitable selective binding moieties are known in the art. Preferred selective binding moieties include, but are not limited to, biotin, a polynucleotide sequence, antibodies, antibody fragments, such as Fab and ScSv, antigens, polynucleotide binding proteins, poly histidine tails and GST tags. The most preferred selective binding moieties are biotin and a selectable polynucleotide sequence. Biotin specifically binds to a surface coated with avidins. Selectable polynucleotide sequences specifically bind (i.e. hybridise) to a surface coated with homologus sequences. Alternatively, selectable polynucleotide sequences specifically bind to a surface coated with polynucleotide binding proteins.
The hairpin loop and/or the selectable binding moiety may comprise a region that can be cut, nicked, cleaved or hydrolysed. Such a region can be designed to allow the constructs to be removed from the surface to which it is bound following purification or isolation. Suitable regions are known in the art. Suitable regions include, but are not limited to, an RNA region, a region comprising desthiobiotin and streptavidin, a disulphide bond and a photocleavable region.
The hairpin loop may be provided at either end of the polynucleotide, i.e. the 5′ or the 3′ end. The hairpin loop may be ligated to the polynucleotide using any method known in the art. The hairpin loop may be ligated using a ligase, such as T4 DNA ligase, E. coli DNA ligase, Taq DNA ligase, Tma DNA ligase and 9° N DNA ligase. The hairpin loop may be added to the constructs as described in International Application No. PCT/GB2014/052505 (published as WO 2015/022544).

Molecular Brakes

The method preferably further comprises attaching one or more molecular brakes to a non-substrate strand. A non-substrate strand is a strand of a MuA double stranded substrate that does not comprise an overhang. The molecular brakes may be attached to the non-substrate strands in the substrates before they are contacted with the template polynucleotide and the MuA transposase. The molecular brakes may be attached to the other strands from the substrates remaining in the constructs after they are created by the MuA transposase.
The molecular brakes are preferably bound to Y adaptors comprising a leader sequence and/or one or more anchors capable of coupling the adaptor to a membrane and the Y adaptors are attached to the other strands in step (c).
The Y adaptors are typically polynucleotide adaptors. They may be formed from any of the polynucleotides discussed above.
The Y adaptor typically comprises (a) a double stranded region and (b) a single stranded region or a region that is not complementary at the other end. The Y adaptor may be described as having an overhang if it comprises a single stranded region. The presence of a non-complementary region in the Y adaptor gives the adaptor its Y shape since the two strands typically do not hybridise to each other unlike the double stranded portion. The Y adaptor may comprise one or more anchors.
The Y adaptor and/or the hairpin loop may be ligated to the polynucleotide using any method known in the art. One or both of the adaptors may be ligated using a ligase, such as T4 DNA ligase, E. coli DNA ligase, Taq DNA ligase, Tma DNA ligase and 9° N DNA ligase. Alternatively, the adaptors may be added to the constructs as described in International Application No. PCT/GB2014/052505 (published as WO 2015/022544).
The Y adaptor may be provided with a leader sequence which preferentially threads into the pore. The leader sequence facilitates the method of the invention. The leader sequence is designed to preferentially thread into the transmembrane pore and thereby facilitate the movement of polynucleotide through the pore. The leader sequence can also be used to link the polynucleotide to the one or more anchors as discussed below.
The leader sequence typically comprises a polymer. The polymer is preferably negatively charged. The polymer is preferably a polynucleotide, such as DNA or RNA, a modified polynucleotide (such as abasic DNA), PNA, LNA, polyethylene glycol (PEG) or a polypeptide. The leader preferably comprises a polynucleotide and more preferably comprises a single stranded polynucleotide. The leader sequence can comprise any of the polynucleotides discussed above. The single stranded leader sequence most preferably comprises a single strand of DNA, such as a poly dT section. The leader sequence preferably comprises the one or more spacers.
The leader sequence can be any length, but is typically 10 to 150 nucleotides in length, such as from 20 to 150 nucleotides in length. The length of the leader typically depends on the transmembrane pore used in the method.
The Y adaptor preferably comprises a selectable binding moiety as discussed above. The Y adaptor and/or the selectable binding moiety may comprise a region that can be cut, nicked, cleaved or hydrolysed as discussed above.
The method comprises contacting the target polynucleotide with a molecular brake which controls the movement of the target polynucleotide through the pore. Any molecular brake may be used including any of those disclosed in International Application No. PCT/GB2014/052737 (published as WO 2015/110777).
The molecular brake is preferably a polynucleotide binding protein. The polynucleotide binding protein may be any protein that is capable of binding to the polynucleotide and controlling its movement through a transmembrane pore as discussed in more detail below. It is straightforward in the art to determine whether or not a protein binds to a polynucleotide. The protein typically interacts with and modifies at least one property of the polynucleotide. The protein may modify the polynucleotide by cleaving it to form individual nucleotides or shorter chains of nucleotides, such as di- or trinucleotides. The moiety may modify the polynucleotide by orienting it or moving it to a specific position, i.e. controlling its movement.
The polynucleotide binding protein is preferably derived from a polynucleotide handling enzyme. A polynucleotide handling enzyme is a polypeptide that is capable of interacting with and modifying at least one property of a polynucleotide. The enzyme may modify the polynucleotide by cleaving it to form individual nucleotides or shorter chains of nucleotides, such as di- or trinucleotides. The enzyme may modify the polynucleotide by orienting it or moving it to a specific position. The polynucleotide handling enzyme does not need to display enzymatic activity as long as it is capable of binding the polynucleotide and controlling its movement through the pore. For instance, the enzyme may be modified to remove its enzymatic activity or may be used under conditions which prevent it from acting as an enzyme. Such conditions are discussed in more detail below.
The polynucleotide handling enzyme is preferably derived from a nucleolytic enzyme. The polynucleotide handling enzyme used in the construct of the enzyme is more preferably derived from a member of any of the Enzyme Classification (EC) groups 3.1.11, 3.1.13, 3.1.14, 3.1.15, 3.1.16, 3.1.21, 3.1.22, 3.1.25, 3.1.26, 3.1.27, 3.1.30 and 3.1.31. The enzyme may be any of those disclosed in International Application No. PCT/GB10/000133 (published as WO 2010/086603).
Preferred enzymes are polymerases, exonucleases, helicases, translocases and topoisomerases, such as gyrases. Suitable enzymes include, but are not limited to, exonuclease I from E. coli (SEQ ID NO: 11), exonuclease III enzyme from E. coli (SEQ ID NO: 13), RecJ from T. thermophilus (SEQ ID NO: 15) and bacteriophage lambda exonuclease (SEQ ID NO: 17), TatD exonuclease and variants thereof. Three subunits comprising the sequence shown in SEQ ID NO: or a variant thereof interact to form a trimer exonuclease. The polymerase may be PyroPhage® 3173 DNA Polymerase (which is commercially available from Lucigen® Corporation), SD Polymerase (commercially available from Bioron®) or variants thereof. The enzyme is preferably Phi29 DNA polymerase (SEQ ID NO: 9) or a variant thereof. The topoisomerase is preferably a member of any of the Moiety Classification (EC) groups 5.99.1.2 and 5.99.1.3.
The enzyme is most preferably derived from a helicase. The helicase may be or be derived from a Hel308 helicase, a RecD helicase, such as TraI helicase or a TrwC helicase, a XPD helicase or a Dda helicase. The helicase may be or be derived from Hel308 Mbu (SEQ ID NO: 18), Hel308 Csy (SEQ ID NO: 19), Hel308 Tga (SEQ ID NO: 20), Hel308 Mhu (SEQ ID NO: 21), TraI Eco (SEQ ID NO: 22), XPD Mbu (SEQ ID NO: 23) or a variant thereof.
The helicase may be any of the helicases, modified helicases or helicase constructs disclosed in International Application Nos. PCT/GB2012/052579 (published as WO 2013/057495); PCT/GB2012/053274 (published as WO 2013/098562); PCT/GB2012/053273 (published as WO2013098561); PCT/GB2013/051925 (published as WO 2014/013260); PCT/GB2013/051924 (published as WO 2014/013259); PCT/GB2013/051928 (published as WO 2014/013262) and PCT/GB2014/052736 (published as WO/2015/055981).
The helicase preferably comprises the sequence shown in SEQ ID NO: 25 (Trwc Cba) or as variant thereof, the sequence shown in SEQ ID NO: 18 (Hel308 Mbu) or a variant thereof or the sequence shown in SEQ ID NO: 24 (Dda) or a variant thereof. Variants may differ from the native sequences in any of the ways discussed below for transmembrane pores. A preferred variant of SEQ ID NO: 24 comprises (a) E94C and A360C or (b) E94C, A360C, C109A and C136A and then optionally (ΔM1)G1 (i.e. deletion of M1 and then addition G1). It may also be termed M1G. Any of the variants discussed above may further comprise M1G.
The Dda helicase preferably comprises any of the modifications disclosed in International Application Nos. PCT/GB2014/052736 and PCT/GB2015/052916 (published as WO/2015/055981 and WO 2016/055777).
Any number of helicases may be used in accordance with the invention. For instance, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more helicases may be used. In some embodiments, different numbers of helicases may be used.
The method of the invention preferably comprises attaching two or more helicases to the other strands. The two or more helicases are typically the same helicase. The two or more helicases may be different helicases.
The two or more helicases may be any combination of the helicases mentioned above. The two or more helicases may be two or more Dda helicases. The two or more helicases may be one or more Dda helicases and one or more TrwC helicases. The two or more helicases may be different variants of the same helicase.
The two or more helicases are preferably attached to one another. The two or more helicases are more preferably covalently attached to one another. The helicases may be attached in any order and using any method. Preferred helicase constructs for use in the invention are described in International Application Nos. PCT/GB2013/051925 (published as WO 2014/013260); PCT/GB2013/051924 (published as WO 2014/013259); PCT/GB2013/051928 (published as WO 2014/013262) and PCT/GB2014/052736.
A variant of SEQ ID NO: 9, 11, 13, 15, 17, 18, 19, 20, 21, 22, 23, 24 or 25 is an enzyme that has an amino acid sequence which varies from that of SEQ ID NO: 9, 11, 13, 15, 17, 18, 19, 20, 21, 22, 23, 24 or 25 and which retains polynucleotide binding ability. This can be measured using any method known in the art. For instance, the variant can be contacted with a polynucleotide and its ability to bind to and move along the polynucleotide can be measured. The variant may include modifications that facilitate binding of the polynucleotide and/or facilitate its activity at high salt concentrations and/or room temperature. Variants may be modified such that they bind polynucleotides (i.e. retain polynucleotide binding ability) but do not function as a helicase (i.e. do not move along polynucleotides when provided with all the necessary components to facilitate movement, e.g. ATP and Mg²⁺). Such modifications are known in the art. For instance, modification of the Mg²⁺ binding domain in helicases typically results in variants which do not function as helicases. These types of variants may act as molecular brakes (see below).
Over the entire length of the amino acid sequence of SEQ ID NO: 9, 11, 13, 15, 17, 18, 19, 20, 21, 22, 23, 24 or 25, a variant will preferably be at least 50% homologous to that sequence based on amino acid identity. More preferably, the variant polypeptide may be at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of SEQ ID NO: 9, 11, 13, 15, 17, 18, 19, 20, 21, 22, 23, 24 or 25 over the entire sequence. There may be at least 80%, for example at least 85%, 90% or 95%, amino acid identity over a stretch of 200 or more, for example 230, 250, 270, 280, 300, 400, 500, 600, 700, 800, 900 or 1000 or more, contiguous amino acids (“hard homology”). Homology is determined as described above. The variant may differ from the wild-type sequence in any of the ways discussed above with reference to SEQ ID NO: 2 and 4 above. The enzyme may be covalently attached to the pore. Any method may be used to covalently attach the enzyme to the pore.
A preferred molecular brake is TrwC Cba-Q594A (SEQ ID NO: 25 with the mutation Q594A). This variant does not function as a helicase (i.e. binds polynucleotides but does not move along them when provided with all the necessary components to facilitate movement, e.g. ATP and Mg²⁺).
In strand sequencing, the polynucleotide is translocated through the pore either with or against an applied potential. Exonucleases that act progressively or processively on double stranded polynucleotides can be used on the cis side of the pore to feed the remaining single strand through under an applied potential or the trans side under a reverse potential. Likewise, a helicase that unwinds the double stranded DNA can also be used in a similar manner. A polymerase may also be used. There are also possibilities for sequencing applications that require strand translocation against an applied potential, but the DNA must be first “caught” by the enzyme under a reverse or no potential. With the potential then switched back following binding the strand will pass cis to trans through the pore and be held in an extended conformation by the current flow. The single strand DNA exonucleases or single strand DNA dependent polymerases can act as molecular motors to pull the recently translocated single strand back through the pore in a controlled stepwise manner, trans to cis, against the applied potential.
Any helicase may be used in the method. Helicases may work in two modes with respect to the pore. First, the method is preferably carried out using a helicase such that it moves the polynucleotide through the pore with the field resulting from the applied voltage. In this mode the 5′ end of the polynucleotide is first captured in the pore, and the helicase moves the polynucleotide into the pore such that it is passed through the pore with the field until it finally translocates through to the trans side of the membrane. Alternatively, the method is preferably carried out such that a helicase moves the polynucleotide through the pore against the field resulting from the applied voltage. In this mode the 3′ end of the polynucleotide is first captured in the pore, and the helicase moves the polynucleotide through the pore such that it is pulled out of the pore against the applied field until finally ejected back to the cis side of the membrane.
The method may also be carried out in the opposite direction. The 3′ end of the polynucleotide may be first captured in the pore and the helicase may move the polynucleotide into the pore such that it is passed through the pore with the field until it finally translocates through to the trans side of the membrane.
When the helicase is not provided with the necessary components to facilitate movement or is modified to hinder or prevent its movement, it can bind to the polynucleotide and act as a brake slowing the movement of the polynucleotide when it is pulled into the pore by the applied field. In the inactive mode, it does not matter whether the polynucleotide is captured either 3′ or 5′ down, it is the applied field which pulls the polynucleotide into the pore towards the trans side with the enzyme acting as a brake. When in the inactive mode, the movement control of the polynucleotide by the helicase can be described in a number of ways including ratcheting, sliding and braking. Helicase variants which lack helicase activity can also be used in this way.
The molecular brake may function as the translocase that removes the MuA transposase. Preferably, the molecular brake is used in addition to a translocase. The molecular brake and translocase may be the same enzyme or different enzymes. Where the molecule brake and translcase are the same enzyme, one molecule of the enzyme may act as a molecular brake and another molecule of the enzyme may act as a translocase to remove the MuA transposase.
The polynucleotide may be contacted with the molecular brake and the pore in any order. It is preferred that, when the polynucleotide is contacted with the molecular brake, such as a helicase, and the pore, the polynucleotide firstly forms a complex with the protein. When the voltage is applied across the pore, the polynucleotide/protein complex then forms a complex with the pore and controls the movement of the polynucleotide through the pore.
Any steps in the method using a polynucleotide binding protein are typically carried out in the presence of free nucleotides or free nucleotide analogues and an enzyme cofactor that facilitates the action of the polynucleotide binding protein. The free nucleotides may be one or more of any of the individual nucleotides discussed above. The free nucleotides include, but are not limited to, adenosine monophosphate (AMP), adenosine diphosphate (ADP), adenosine triphosphate (ATP), guanosine monophosphate (GMP), guanosine diphosphate (GDP), guanosine triphosphate (GTP), thymidine monophosphate (TMP), thymidine diphosphate (TDP), thymidine triphosphate (TTP), uridine monophosphate (UMP), uridine diphosphate (UDP), uridine triphosphate (UTP), cytidine monophosphate (CMP), cytidine diphosphate (CDP), cytidine triphosphate (CTP), cyclic adenosine monophosphate (cAMP), cyclic guanosine monophosphate (cGMP), deoxyadenosine monophosphate (dAMP), deoxyadenosine diphosphate (dADP), deoxyadenosine triphosphate (dATP), deoxyguanosine monophosphate (dGMP), deoxyguanosine diphosphate (dGDP), deoxyguanosine triphosphate (dGTP), deoxythymidine monophosphate (dTMP), deoxythymidine diphosphate (dTDP), deoxythymidine triphosphate (dTTP), deoxyuridine monophosphate (dUMP), deoxyuridine diphosphate (dUDP), deoxyuridine triphosphate (dUTP), deoxycytidine monophosphate (dCMP), deoxycytidine diphosphate (dCDP) and deoxycytidine triphosphate (dCTP). The free nucleotides are preferably selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP or dCMP. The free nucleotides are preferably adenosine triphosphate (ATP). The enzyme cofactor is a factor that allows the construct to function. The enzyme cofactor is preferably a divalent metal cation. The divalent metal cation is preferably Mg²⁺, Mn²⁺, Ca²⁺ or Co²⁺. The enzyme cofactor is most preferably Mg²⁺.
The molecular brakes may be any compound or molecule which binds to the polynucleotide and slows the movement of the polynucleotide through the pore. The molecular brake may be any of those discussed above. The molecular brake preferably comprises a compound which binds to the polynucleotide. The compound is preferably a macrocycle. Suitable macrocycles include, but are not limited to, cyclodextrins, calixarenes, cyclic peptides, crown ethers, cucurbiturils, pillararenes, derivatives thereof or a combination thereof. The cyclodextrin or derivative thereof may be any of those disclosed in Eliseev, A. V., and Schneider, H-J. (1994) J. Am. Chem. Soc. 116, 6081-6088. The cyclodextrin is more preferably heptakis-6-amino-β-cyclodextrin (am₇-βCD), 6-monodeoxy-6-monoamino-β-cyclodextrin (ami-βCD) or heptakis-(6-deoxy-6-guanidino)-cyclodextrin (guy-βCD).

Lack of Heating

The method of the invention preferably does not comprise heat inactivating the MuA transposase. Heat inactivation may also inactivate any other enzymes or proteins being used in the preparation or characterisation of the modified polynucleotides. Removing the heat inactivation step also dispenses with the need for additional equipment required for heating, such as a thermal cycler, hot block, or water bath, used for heating up the sample. The method of the invention can therefore be used in a variety of different settings including those without an electricity supply.

Products of the Invention

The invention also provides a population of double stranded MuA substrates for modifying a template polynucleotide, wherein each substrate comprises an overhang at one or both ends and a translocases bound to an overhang. Any of the embodiments discussed above equally apply to the population of the invention.
The invention also provides a plurality of polynucleotides modified using the method of the invention. The plurality of polynucleotides may be in any of the forms discussed above.
The population or plurality may be isolated, substantially isolated, purified or substantially purified. A population or plurality is isolated or purified if it is completely free of any other components, such as the template polynucleotide, lipids or pores. A population or plurality is substantially isolated if it is mixed with carriers or diluents which will not interfere with its intended use. For instance, a population or plurality is substantially isolated or substantially purified if it is present in a form that comprises less than 10%, less than 5%, less than 2% or less than 1% of other components, such as lipids or pores.

Characterisation Method of the Invention

The invention also comprises a method of characterising at least one polynucleotide modified using a method of the invention. The modified polynucleotide is contacted with a transmembrane pore such that at least one strand of the polynucleotide moves through the pore. One or more measurements which are indicative of one or more characteristics of the polynucleotide are taken as the at least one strand moves with respect to the pore.
The invention also provides a method of characterising a template polynucleotide. The template polynucleotide is modified using the method of the invention to produce a plurality of modified polynucleotides. Each modified polynucleotide is contacted with a transmembrane pore such that at least one strand of each polynucleotide moves through the pore. One or more measurements which are indicative of one or more characteristics of the polynucleotide are taken as the at least one strand of each polynucleotide moves with respect to the pore.
If the/each modified polynucleotide comprises a hairpin loop, the method preferably comprises contacting the/each modified polynucleotide with a transmembrane pore such that both strands of the polynucleotide move through the pore. If molecular brakes are present on the/each modified polynucleotides, the molecular brakes may control the movement of the/each modified polynucleotide through the pore and/or separate the two strands of the/each modified polynucleotide.

Membrane

The transmembrane pore is typically in a membrane. Any membrane may be used in accordance with the invention. Suitable membranes are well-known in the art. The membrane is preferably an amphiphilic layer. An amphiphilic layer is a layer formed from amphiphilic molecules, such as phospholipids, which have both hydrophilic and lipophilic properties. The amphiphilic molecules may be synthetic or naturally occurring. Non-naturally occurring amphiphiles and amphiphiles which form a monolayer are known in the art and include, for example, block copolymers (Gonzalez-Perez et al., Langmuir, 2009, 25, 10447-10450). Block copolymers are polymeric materials in which two or more monomer sub-units are polymerised together to create a single polymer chain. Block copolymers typically have properties that are contributed by each monomer sub-unit. However, a block copolymer may have unique properties that polymers formed from the individual sub-units do not possess. Block copolymers can be engineered such that one of the monomer sub-units is hydrophobic (i.e. lipophilic), whilst the other sub-unit(s) are hydrophilic whilst in aqueous media. In this case, the block copolymer may possess amphiphilic properties and may form a structure that mimics a biological membrane. The block copolymer may be a diblock (consisting of two monomer sub-units), but may also be constructed from more than two monomer sub-units to form more complex arrangements that behave as amphiphiles. The copolymer may be a triblock, tetrablock or pentablock copolymer. The membrane is preferably a triblock copolymer membrane.
Archaebacterial bipolar tetraether lipids are naturally occurring lipids that are constructed such that the lipid forms a monolayer membrane. These lipids are generally found in extremophiles that survive in harsh biological environments, thermophiles, halophiles and acidophiles. Their stability is believed to derive from the fused nature of the final bilayer. It is straightforward to construct block copolymer materials that mimic these biological entities by creating a triblock polymer that has the general motif hydrophilic-hydrophobic-hydrophilic. This material may form monomeric membranes that behave similarly to lipid bilayers and encompass a range of phase behaviours from vesicles through to laminar membranes. Membranes formed from these triblock copolymers hold several advantages over biological lipid membranes. Because the triblock copolymer is synthesised, the exact construction can be carefully controlled to provide the correct chain lengths and properties required to form membranes and to interact with pores and other proteins.
Block copolymers may also be constructed from sub-units that are not classed as lipid sub-materials; for example a hydrophobic polymer may be made from siloxane or other non-hydrocarbon based monomers. The hydrophilic sub-section of block copolymer can also possess low protein binding properties, which allows the creation of a membrane that is highly resistant when exposed to raw biological samples. This head group unit may also be derived from non-classical lipid head-groups.
Triblock copolymer membranes also have increased mechanical and environmental stability compared with biological lipid membranes, for example a much higher operational temperature or pH range. The synthetic nature of the block copolymers provides a platform to customise polymer based membranes for a wide range of applications.
The membrane is most preferably one of the membranes disclosed in International Application No. PCT/GB2013/052766 or PCT/GB2013/052767.
The amphiphilic molecules may be chemically-modified or functionalised to facilitate coupling of the polynucleotide.
The amphiphilic layer may be a monolayer or a bilayer. The amphiphilic layer is typically planar. The amphiphilic layer may be curved. The amphiphilic layer may be supported. The amphiphilic layer may be concave. The amphiphilic layer may be suspended from raised pillars such that the peripheral region of the amphiphilic layer is higher than the amphiphilic layer region in the centre. This may allow the microparticle to travel, move, slide or roll along the membrane as described above.
Amphiphilic membranes are typically naturally mobile, essentially acting as two dimensional fluids with lipid diffusion rates of approximately 10⁻⁸cm s-1. This means that the pore and coupled polynucleotide can typically move within an amphiphilic membrane.
The membrane may be a lipid bilayer. Lipid bilayers are models of cell membranes and serve as excellent platforms for a range of experimental studies. For example, lipid bilayers can be used for in vitro investigation of membrane proteins by single-channel recording. Alternatively, lipid bilayers can be used as biosensors to detect the presence of a range of substances. The lipid bilayer may be any lipid bilayer. Suitable lipid bilayers include, but are not limited to, a planar lipid bilayer, a supported bilayer or a liposome. The lipid bilayer is preferably a planar lipid bilayer. Suitable lipid bilayers are disclosed in International Application No. PCT/GB08/000563 (published as WO 2008/102121), International Application No. PCT/GB08/004127 (published as WO 2009/077734) and International Application No. PCT/GB2006/001057 (published as WO 2006/100484).
Methods for forming lipid bilayers are known in the art. Lipid bilayers are commonly formed by the method of Montal and Mueller (Proc. Natl. Acad. Sci. USA., 1972; 69: 3561-3566), in which a lipid monolayer is carried on aqueous solution/air interface past either side of an aperture which is perpendicular to that interface. The lipid is normally added to the surface of an aqueous electrolyte solution by first dissolving it in an organic solvent and then allowing a drop of the solvent to evaporate on the surface of the aqueous solution on either side of the aperture. Once the organic solvent has evaporated, the solution/air interfaces on either side of the aperture are physically moved up and down past the aperture until a bilayer is formed. Planar lipid bilayers may be formed across an aperture in a membrane or across an opening into a recess.
The method of Montal & Mueller is popular because it is a cost-effective and relatively straightforward method of forming good quality lipid bilayers that are suitable for protein pore insertion. Other common methods of bilayer formation include tip-dipping, painting bilayers and patch-clamping of liposome bilayers.
Tip-dipping bilayer formation entails touching the aperture surface (for example, a pipette tip) onto the surface of a test solution that is carrying a monolayer of lipid. Again, the lipid monolayer is first generated at the solution/air interface by allowing a drop of lipid dissolved in organic solvent to evaporate at the solution surface. The bilayer is then formed by the Langmuir-Schaefer process and requires mechanical automation to move the aperture relative to the solution surface.
For painted bilayers, a drop of lipid dissolved in organic solvent is applied directly to the aperture, which is submerged in an aqueous test solution. The lipid solution is spread thinly over the aperture using a paintbrush or an equivalent. Thinning of the solvent results in formation of a lipid bilayer. However, complete removal of the solvent from the bilayer is difficult and consequently the bilayer formed by this method is less stable and more prone to noise during electrochemical measurement.
Patch-clamping is commonly used in the study of biological cell membranes. The cell membrane is clamped to the end of a pipette by suction and a patch of the membrane becomes attached over the aperture. The method has been adapted for producing lipid bilayers by clamping liposomes which then burst to leave a lipid bilayer sealing over the aperture of the pipette. The method requires stable, giant and unilamellar liposomes and the fabrication of small apertures in materials having a glass surface.
Liposomes can be formed by sonication, extrusion or the Mozafari method (Colas et al. (2007) Micron 38:841-847).
In a preferred embodiment, the lipid bilayer is formed as described in International Application No. PCT/GB08/004127 (published as WO 2009/077734). Advantageously in this method, the lipid bilayer is formed from dried lipids. In a most preferred embodiment, the lipid bilayer is formed across an opening as described in WO2009/077734 (PCT/GB08/004127).
A lipid bilayer is formed from two opposing layers of lipids. The two layers of lipids are arranged such that their hydrophobic tail groups face towards each other to form a hydrophobic interior. The hydrophilic head groups of the lipids face outwards towards the aqueous environment on each side of the bilayer. The bilayer may be present in a number of lipid phases including, but not limited to, the liquid disordered phase (fluid lamellar), liquid ordered phase, solid ordered phase (lamellar gel phase, interdigitated gel phase) and planar bilayer crystals (lamellar sub-gel phase, lamellar crystalline phase).
Any lipid composition that forms a lipid bilayer may be used. The lipid composition is chosen such that a lipid bilayer having the required properties, such as surface charge, ability to support membrane proteins, packing density or mechanical properties, is formed. The lipid composition can comprise one or more different lipids. For instance, the lipid composition can contain up to 100 lipids. The lipid composition preferably contains 1 to 10 lipids. The lipid composition may comprise naturally-occurring lipids and/or artificial lipids.
The lipids typically comprise a head group, an interfacial moiety and two hydrophobic tail groups which may be the same or different. Suitable head groups include, but are not limited to, neutral head groups, such as diacylglycerides (DG) and ceramides (CM); zwitterionic head groups, such as phosphatidylcholine (PC), phosphatidylethanolamine (PE) and sphingomyelin
(SM); negatively charged head groups, such as phosphatidylglycerol (PG); phosphatidylserine (PS), phosphatidylinositol (PI), phosphatic acid (PA) and cardiolipin (CA); and positively charged headgroups, such as trimethylammonium-Propane (TAP). Suitable interfacial moieties include, but are not limited to, naturally-occurring interfacial moieties, such as glycerol-based or ceramide-based moieties. Suitable hydrophobic tail groups include, but are not limited to, saturated hydrocarbon chains, such as lauric acid (n-Dodecanolic acid), myristic acid (n-Tetradecononic acid), palmitic acid (n-Hexadecanoic acid), stearic acid (n-Octadecanoic) and arachidic (n-Eicosanoic); unsaturated hydrocarbon chains, such as oleic acid (cis-9-Octadecanoic); and branched hydrocarbon chains, such as phytanoyl. The length of the chain and the position and number of the double bonds in the unsaturated hydrocarbon chains can vary. The length of the chains and the position and number of the branches, such as methyl groups, in the branched hydrocarbon chains can vary. The hydrophobic tail groups can be linked to the interfacial moiety as an ether or an ester. The lipids may be mycolic acid.
The lipids can also be chemically-modified. The head group or the tail group of the lipids may be chemically-modified. Suitable lipids whose head groups have been chemically-modified include, but are not limited to, PEG-modified lipids, such as 1,2-Diacyl-sn-Glycero-3-Phosphoethanolamine-N-[Methoxy(Polyethylene glycol)-2000]; functionalised PEG Lipids, such as 1,2-Distearoyl-sn-Glycero-3 Phosphoethanolamine-N-[Biotinyl(Polyethylene Glycol) 2000]; and lipids modified for conjugation, such as 1,2-Dioleoyl-sn-Glycero-3-Phosphoethanolamine-N-(succinyl) and 1,2-Dipalmitoyl-sn-Glycero-3-Phosphoethanolamine-N-(Biotinyl). Suitable lipids whose tail groups have been chemically-modified include, but are not limited to, polymerisable lipids, such as 1,2-bis(10,12-tricosadiynoyl)-sn-Glycero-3-Phosphocholine; fluorinated lipids, such as 1-Palmitoyl-2-(16-Fluoropalmitoyl)-sn-Glycero-3-Phosphocholine; deuterated lipids, such as 1,2-Dipalmitoyl-D62-sn-Glycero-3-Phosphocholine; and ether linked lipids, such as 1,2-Di-O-phytanyl-sn-Glycero-3-Phosphocholine. The lipids may be chemically-modified or functionalised to facilitate coupling of the polynucleotide.
The amphiphilic layer, for example the lipid composition, typically comprises one or more additives that will affect the properties of the layer. Suitable additives include, but are not limited to, fatty acids, such as palmitic acid, myristic acid and oleic acid; fatty alcohols, such as palmitic alcohol, myristic alcohol and oleic alcohol; sterols, such as cholesterol, ergosterol, lanosterol, sitosterol and stigmasterol; lysophospholipids, such as 1-Acyl-2-Hydroxy-sn-Glycero-3-Phosphocholine; and ceramides.
In another preferred embodiment, the membrane is a solid state layer. Solid state layers can be formed from both organic and inorganic materials including, but not limited to, microelectronic materials, insulating materials such as HfO₂, Si₃N₄, Al₂O₃, and SiO, organic and inorganic polymers such as polyamide, plastics such as Teflon® or elastomers such as two-component addition-cure silicone rubber, and glasses. The solid state layer may be by atomic layer deposition (ALD). The ALD solid state layer may comprise alternating layers of HfO₂and Al₂O₃. The solid state layer may be formed from monatomic layers, such as graphene, or layers that are only a few atoms thick. Suitable graphene layers are disclosed in International Application No. PCT/US2008/010637 (published as WO 2009/035647). Yusko et al., Nature Nanotechnology, 2011; 6: 253-260 and US Patent Application No. 2013/0048499 describe the delivery of proteins to transmembrane pores in solid state layers without the use of microparticles. The method of the invention may be used to improve the delivery in the methods disclosed in these documents.
The method is typically carried out using (i) an artificial amphiphilic layer comprising a pore, (ii) an isolated, naturally-occurring lipid bilayer comprising a pore, or (iii) a cell having a pore inserted therein. The method is typically carried out using an artificial amphiphilic layer, such as an artificial triblock copolymer layer. The layer may comprise other transmembrane and/or intramembrane proteins as well as other molecules in addition to the pore. Suitable apparatus and conditions are discussed below. The method of the invention is typically carried out in vitro.

Transmembrane Pore

A transmembrane pore is a structure that crosses the membrane to some degree. Typically, a transmembrane pore comprises a first opening and a second opening with a lumen extending between the first opening and the second opening. The transmembrane pore permits hydrated ions driven by an applied potential to flow across or within the membrane. The transmembrane pore typically crosses the entire membrane so that hydrated ions may flow from one side of the membrane to the other side of the membrane. However, the transmembrane pore does not have to cross the membrane. It may be closed at one end. For instance, the pore may be a well, gap, channel, trench or slit in the membrane along which or into which hydrated ions may flow.
Any transmembrane pore may be used in the invention. The pore may be biological or artificial. Suitable pores include, but are not limited to, protein pores, polynucleotide pores and solid state pores. The pore may be a DNA origami pore (Langecker et al., Science, 2012; 338: 932-936).
The transmembrane pore is preferably a transmembrane protein pore. A transmembrane protein pore is a polypeptide or a collection of polypeptides that permits hydrated ions, such as polynucleotide, to flow from one side of a membrane to the other side of the membrane. In the present invention, the transmembrane protein pore is capable of forming a pore that permits hydrated ions driven by an applied potential to flow from one side of the membrane to the other. The transmembrane protein pore preferably permits polynucleotides to flow from one side of the membrane, such as a triblock copolymer membrane, to the other. The transmembrane protein pore allows a polynucleotide, such as DNA or RNA, to be moved through the pore.
The transmembrane protein pore may be a monomer or an oligomer. The pore is preferably made up of several repeating subunits, such as at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, or at least 16 subunits. The pore is preferably a hexameric, heptameric, octameric or nonameric pore. The pore may be a homo-oligomer or a hetero-oligomer.
The transmembrane protein pore typically comprises a barrel or channel through which the ions may flow. The subunits of the pore typically surround a central axis and contribute strands to a transmembrane β barrel or channel or a transmembrane α-helix bundle or channel.
The barrel or channel of the transmembrane protein pore typically comprises amino acids that facilitate interaction with s, such as nucleotides, polynucleotides or nucleic acids. These amino acids are preferably located near a constriction of the barrel or channel. The transmembrane protein pore typically comprises one or more positively charged amino acids, such as arginine, lysine or histidine, or aromatic amino acids, such as tyrosine or tryptophan. These amino acids typically facilitate the interaction between the pore and nucleotides, polynucleotides or nucleic acids.
Transmembrane protein pores for use in accordance with the invention can be derived from β-barrel pores or α-helix bundle pores. β-barrel pores comprise a barrel or channel that is formed from β-strands. Suitable β-barrel pores include, but are not limited to, β-toxins, such as α-hemolysin, anthrax toxin and leukocidins, and outer membrane proteins/porins of bacteria, such as Mycobacterium smegmatis porin (Msp), for example MspA, MspB, MspC or MspD, CsgG, outer membrane porn F (OmpF), outer membrane porn G (OmpG), outer membrane phospholipase A and Neisseria autotransporter lipoprotein (NalP) and other pores, such as lysenin. α-helix bundle pores comprise a barrel or channel that is formed from α-helices. Suitable α-helix bundle pores include, but are not limited to, inner membrane proteins and a outer membrane proteins, such as WZA and ClyA toxin. The transmembrane pore may be derived from lysenin. Suitable pores derived from CsgG are disclosed in International Application No. PCT/EP2015/069965. Suitable pores derived from lysenin are disclosed in International Application No. PCT/GB2013/050667 (published as WO 2013/153359). The transmembrane pore may be derived from or based on Msp, α-hemolysin (α-HL), lysenin, CsgG, ClyA, Sp1 and haemolytic protein fragaceatoxin C (FraC). The wild type α-hemolysin pore is formed of 7 identical monomers or sub-units (i.e., it is heptameric). The sequence of one monomer or sub-unit of α-hemolysin-NN is shown in SEQ ID NO: 4.
The transmembrane protein pore is preferably derived from Msp, more preferably from MspA. Such a pore will be oligomeric and typically comprises 7, 8, 9 or 10 monomers derived from Msp. The pore may be a homo-oligomeric pore derived from Msp comprising identical monomers. Alternatively, the pore may be a hetero-oligomeric pore derived from Msp comprising at least one monomer that differs from the others. Preferably the pore is derived from MspA or a homolog or paralog thereof.
A monomer derived from Msp typically comprises the sequence shown in SEQ ID NO: 2 or a variant thereof. SEQ ID NO: 2 is the MS-(B1)8 mutant of the MspA monomer. It includes the following mutations: D90N, D91N, D93N, D118R, D134R and E139K. A variant of SEQ ID NO: 2 is a polypeptide that has an amino acid sequence which varies from that of SEQ ID NO: 2 and which retains its ability to form a pore. The ability of a variant to form a pore can be assayed using any method known in the art. For instance, the variant may be inserted into an amphiphilic layer along with other appropriate subunits and its ability to oligomerise to form a pore may be determined. Methods are known in the art for inserting subunits into membranes, such as amphiphilic layers. For example, subunits may be suspended in a purified form in a solution containing a triblock copolymer membrane such that it diffuses to the membrane and is inserted by binding to the membrane and assembling into a functional state. Alternatively, subunits may be directly inserted into the membrane using the “pick and place” method described in M. A. Holden, H. Bayley. J. Am. Chem. Soc. 2005, 127, 6502-6503 and International Application No. PCT/GB2006/001057 (published as WO 2006/100484).
Over the entire length of the amino acid sequence of SEQ ID NO: 2, a variant will preferably be at least 50% homologous to that sequence based on amino acid similarity or identity. More preferably, the variant may be at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid similarity or identity to the amino acid sequence of SEQ ID NO: 2 over the entire sequence. There may be at least 80%, for example at least 85%, 90% or 95%, amino acid similarity or identity over a stretch of 100 or more, for example 125, 150, 175 or 200 or more, contiguous amino acids (“hard homology”).
Standard methods in the art may be used to determine homology. For example the UWGCG Package provides the BESTFIT program which can be used to calculate homology, for example used on its default settings (Devereux et al (1984) Nucleic Acids Research 12, p 387-395). The PILEUP and BLAST algorithms can be used to calculate homology or line up sequences (such as identifying equivalent residues or corresponding sequences (typically on their default settings)), for example as described in Altschul S. F. (1993) J Mol Evol 36:290-300; Altschul, S. F et al (1990) J Mol Biol 215:403-10. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). Similarity can be measured using pairwise identity or by applying a scoring matrix such as BLOSUM62 and converting to an equivalent identity. Since they represent functional rather than evolved changes, deliberately mutated positions would be masked when determining homology. Similarity may be determined more sensitively by the application of position-specific scoring matrices using, for example, PSIBLAST on a comprehensive database of protein sequences. A different scoring matrix could be used that reflect amino acid chemico-physical properties rather than frequency of substitution over evolutionary time scales (e.g. charge).
SEQ ID NO: 2 is the MS-(B1)8 mutant of the MspA monomer. The variant may comprise any of the mutations in the MspB, C or D monomers compared with MspA. The mature forms of MspB, C and D are shown in SEQ ID NOs: 5 to 7. In particular, the variant may comprise the following substitution present in MspB: A138P. The variant may comprise one or more of the following substitutions present in MspC: A96G, N102E and A138P. The variant may comprise one or more of the following mutations present in MspD: Deletion of G1, L2V, E5Q, L8V, D13G, W21A, D22E, K47T, I49H, I68V, D91G, A96Q, N102D, S103T, V1041, S136K and G141A.
The variant may comprise combinations of one or more of the mutations and substitutions from Msp B, C and D. The variant preferably comprises the mutation L88N. A variant of SEQ ID NO: 2 has the mutation L88N in addition to all the mutations of MS-B1 and is called MS-(B2)8. The pore used in the invention is preferably MS-(B2)8. The variant of SEQ ID NO: 2 preferably comprises one or more of D56N, D56F, E59R, G75S, G77S, A96D and Q126R. A variant of SEQ ID NO: 2 has the mutations G75S/G77S/L88N/Q126R in addition to all the mutations of MS-B 1 and is called MS-B2C. The pore used in the invention is preferably MS-(B2)8 or MS-(B2C)8. The variant of SEQ ID NO: 2 preferably comprises N93D. The variant more preferably comprises the mutations G75S/G77S/L88N/N93D/Q126R.
Amino acid substitutions may be made to the amino acid sequence of SEQ ID NO: 2 in addition to those discussed above, for example up to 1, 2, 3, 4, 5, 10, 20 or 30 substitutions.
Conservative substitutions replace amino acids with other amino acids of similar chemical structure, similar chemical properties or similar side-chain volume. The amino acids introduced may have similar polarity, hydrophilicity, hydrophobicity, basicity, acidity, neutrality or charge to the amino acids they replace. Alternatively, the conservative substitution may introduce another amino acid that is aromatic or aliphatic in the place of a pre-existing aromatic or aliphatic amino acid.
The transmembrane protein pore is preferably derived from CsgG, more preferably from CsgG from E. coli Str. K-12 substr. MC4100. Such a pore will be oligomeric and typically comprises 7, 8, 9 or 10 monomers derived from CsgG. The pore may be a homo-oligomeric pore derived from CsgG comprising identical monomers. Alternatively, the pore may be a hetero-oligomeric pore derived from CsgG comprising at least one monomer that differs from the others.
A monomer derived from CsgG typically comprises the sequence shown in SEQ ID NO: 114 or a variant thereof. A variant of SEQ ID NO: 114 is a polypeptide that has an amino acid sequence which varies from that of SEQ ID NO: 114 and which retains its ability to form a pore.
The ability of a variant to form a pore can be assayed using any method known in the art as discussed above.
Over the entire length of the amino acid sequence of any one of SEQ ID NO: 114, a variant will preferably be at least 50% homologous to that sequence based on amino acid similarity or identity. More preferably, the variant may be at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid similarity or identity to the amino acid sequence of SEQ ID NO: 114 over the entire sequence. There may be at least 80%, for example at least 85%, 90% or 95%, amino acid similarity or identity over a stretch of 100 or more, for example 125, 150, 175 or 200 or more, contiguous amino acids (“hard homology”). Homology can be measured as discussed above.
The variant of SEQ ID NO: 114 may comprise any of the mutations disclosed in International Application No. PCT/GB2015/069965 (published as WO 2016/034591). The variant of SEQ ID NO: 114 preferably comprises one or more of the following (i) one or more mutations at the following positions (i.e. mutations at one or more of the following positions) N40, D43, E44, S54, S57, Q62, R97, E101, E124, E131, R142, T150 and R192, such as one or more mutations at the following positions (i.e. mutations at one or more of the following positions) N40, D43, E44, S54, S57, Q62, E101, E131 and T150 or N40, D43, E44, E101 and E131; (ii) mutations at Y51/N55, Y51/F56, N55/F56 or Y51/N55/F56; (iii) Q42R or Q42K; (iv) K49R; (v) N102R, N102F, N102Y or N102W; (vi) D149N, D149Q or D149R; (vii) E185N, E185Q or E185R; (viii) D195N, D195Q or D195R; (ix) E201N, E201Q or E201R; (x) E203N, E203Q or E203R; and (xi) deletion of one or more of the following positions F48, K49, P50, Y51, P52, A53, S54, N55, F56 and S57. The variant may comprise any combination of (i) to (xi). If the variant comprises any one of (i) and (iii) to (xi), it may further comprise a mutation at one or more of Y51, N55 and F56, such as at Y51, N55, F56, Y51/N55, Y51/F56, N55/F56 or Y51/N55/F56.
Preferred variants of SEQ ID NO: 114 which form pores in which fewer nucleotides contribute to the current as the polynucleotide moves through the pore comprise Y51A/F56A, Y51A/F56N, Y51I/F56A, Y51L/F56A, Y51T/F56A, Y51I/F56N, Y51L/F56N or Y51T/F56N or more preferably Y51I/F56A, Y51L/F56A or Y51T/F56A.
Preferred variants of SEQ ID NO: 114 which form pores displaying an increased range comprise mutations at the following positions:

- Y51, F56, D149, E185, E201 and E203;
- N55 and F56;
- Y51 and F56;
- Y51, N55 and F56; or
- F56 and N102.

Preferred variants of SEQ ID NO: 114 which form pores displaying an increased range comprise:

- Y51N, F56A, D149N, E185R, E201N and E203N;
- N55S and F56Q;
- Y51A and F56A;
- Y51A and F56N;
- Y51I and F56A;
- Y51L and F56A;
- Y51T and F56A;
- Y51I and F56N;
- Y51L and F56N;
- Y51T and F56N;
- Y51T and F56Q;
- Y51A, N55S and F56A;
- Y51A, N55S and F56N;
- Y51T, N55S and F56Q; or
- F56Q and N102R.

Preferred variants of SEQ ID NO: 114 which form pores in which fewer nucleotides contribute to the current as the polynucleotide moves through the pore comprise mutations at the following positions:

- N55 and F56, such as N55X and F56Q, wherein X is any amino acid; or
- Y51 and F56, such as Y51X and F56Q, wherein X is any amino acid.

Preferred variants of SEQ ID NO: 114 which form pores displaying an increased throughput comprise mutations at the following positions:

- D149, E185 and E203;
- D149, E185, E201 and E203; or
- D149, E185, D195, E201 and E203.

Preferred variants which form pores displaying an increased throughput comprise:

- D149N, E185N and E203N;
- D149N, E185N, E201N and E203N;
- D149N, E185R, D195N, E201N and E203N; or
- D149N, E185R, D195N, E201R and E203N.

Preferred variants of SEQ ID NO: 7 which form pores in which capture of the polynucleotide is increased comprise the following mutations:

- D43N/Y51T/F56Q;
- E44N/Y51T/F56Q;
- D43N/E44N/Y51T/F56Q;
- Y51T/F56Q/Q62R;
- D43N/Y51T/F56Q/Q62R;
- E44N/Y51T/F56Q/Q62R; or
- D43N/E44N/Y51T/F56Q/Q62R.

Preferred variants of SEQ ID NO: 114 comprise the following mutations:

- D149R/E185R/E201R/E203R or Y51T/F56Q/D149R/E185R/E201R/E203R;
- D149N/E185N/E201N/E203N or Y51T/F56Q/D149N/E185N/E201N/E203N;
- E201R/E203R or Y51T/F56Q/E201R/E203R
- E201N/E203R or Y51T/F56Q/E201N/E203R;
- E203R or Y51T/F56Q/E203R;
- E203N or Y51T/F56Q/E203N;
- E201R or Y51T/F56Q/E201R;
- E201N or Y51T/F56Q/E201N;
- E185R or Y51T/F56Q/E185R;
- E185N or Y51T/F56Q/E185N;
- D149R or Y51T/F56Q/D149R;
- D149N or Y51T/F56Q/D149N;
- R142E or Y51T/F56Q/R142E;
- R142N or Y51T/F56Q/R142N;
- R192E or Y51T/F56Q/R192E; or
- R192N or Y51T/F56Q/R192N.

Preferred variants of SEQ ID NO: 114 comprise the following mutations:

- Y51A/F56Q/E101N/N102R;
- Y51A/F56Q/R97N/N102G;
- Y51A/F56Q/R97N/N102R;
- Y51A/F56Q/R97N;
- Y51A/F56Q/R97G;
- Y51A/F56Q/R97L;
- Y51A/F56Q/N102R;
- Y51A/F56Q/N102F;
- Y51A/F56Q/N102G;
- Y51A/F56Q/E101R;
- Y51A/F56Q/E101F;
- Y51A/F56Q/E101N; or
- Y51A/F56Q/E101G

The variant of SEQ ID NO: 114 may comprise any of the substitutions present in another CsgG homologue. Preferred CsgG homologues are shown in SEQ ID NOs: 3 to 7 and 26 to 41 of International Application No. PCT/GB2015/069965 (published as WO 2016/034591).
Any of the proteins described herein, such as the transmembrane protein pores, may be modified to assist their identification or purification, for example by the addition of histidine residues (a his tag), aspartic acid residues (an asp tag), a streptavidin tag, a flag tag, a SUMO tag, a GST tag or a MBP tag, or by the addition of a signal sequence to promote their secretion from a cell where the polypeptide does not naturally contain such a sequence. An alternative to introducing a genetic tag is to chemically react a tag onto a native or engineered position on the pore or construct. An example of this would be to react a gel-shift reagent to a cysteine engineered on the outside of the pore. This has been demonstrated as a method for separating hemolysin hetero-oligomers (Chem Biol. 1997 July; 4(7):497-505).
The pore may be labelled with a revealing label. The revealing label may be any suitable label which allows the pore to be detected. Suitable labels include, but are not limited to, fluorescent molecules, radioisotopes, e.g. ¹²⁵I, ³⁵S, enzymes, antibodies, antigens, polynucleotides and ligands such as biotin.
Any of the proteins described herein, such as the transmembrane protein pores, may be made synthetically or by recombinant means. For example, the pore may be synthesised by in vitro translation and transcription (IVTT). The amino acid sequence of the pore may be modified to include non-naturally occurring amino acids or to increase the stability of the protein. When a protein is produced by synthetic means, such amino acids may be introduced during production. The pore may also be altered following either synthetic or recombinant production.
Any of the proteins described herein, such as the transmembrane protein pores, can be produced using standard methods known in the art. Polynucleotide sequences encoding a pore or construct may be derived and replicated using standard methods in the art. Polynucleotide sequences encoding a pore or construct may be expressed in a bacterial host cell using standard techniques in the art. The pore may be produced in a cell by in situ expression of the polypeptide from a recombinant expression vector. The expression vector optionally carries an inducible promoter to control the expression of the polypeptide. These methods are described in Sambrook, J. and Russell, D. (2001). Molecular Cloning: A Laboratory Manual, 3rd Edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
The pore may be produced in large scale following purification by any protein liquid chromatography system from protein producing organisms or after recombinant expression. Typical protein liquid chromatography systems include FPLC, AKTA systems, the Bio-Cad system, the Bio-Rad BioLogic system and the Gilson HPLC system.

Coupling

The/each modified polynucleortide preferably comprises one or more anchors which are capable of coupling to the membrane. The method preferably further comprises coupling the target polynucleotide to the membrane using the one or more anchors.
The anchor comprises a group which couples (or binds) to the polynucleotide and a group which couples (or binds) to the membrane. Each anchor may covalently couple (or bind) to the polynucleotide and/or the membrane. The group may be a chemical group and/or a functional group.
The polynucleotide may be coupled to the membrane using any number of anchors, such as 2, 3, 4 or more anchors. For instance, the polynucleotide may be coupled to the membrane using two anchors each of which separately couples (or binds) to both the polynucleotide and membrane.
The one or more anchors may comprise one or more molecular brakes or polynucleotide binding proteins. Each anchor may comprise one or more molecular brakes or polynucleotide binding proteins. The molecular brake(s) or polynucleotide binding protein(s) may be any of those discussed below.
If the membrane is an amphiphilic layer, such as a triblock copolymer membrane, the one or more anchors preferably comprise a polypeptide anchor present in the membrane and/or a hydrophobic anchor present in the membrane. The hydrophobic anchor is preferably a lipid, fatty acid, sterol, carbon nanotube, polypeptide, protein or amino acid, for example cholesterol, palmitate or tocopherol. In preferred embodiments, the one or more anchors are not the pore.
The components of the membrane, such as the amphiphilic molecules, copolymer or lipids, may be chemically-modified or functionalised to form the one or more anchors. Examples of suitable chemical modifications and suitable ways of functionalising the components of the membrane are discussed in more detail below. Any proportion of the membrane components may be functionalised, for example at least 0.01%, at least 0.1%, at least 1%, at least 10%, at least 25%, at least 50% or 100%.
The polynucleotide may be coupled directly to the membrane. The one or more anchors used to couple the polynucleotide to the membrane preferably comprise a linker. The one or more anchors may comprise one or more, such as 2, 3, 4 or more, linkers. One linker may be used to couple more than one, such as 2, 3, 4 or more, polynucleotides to the membrane.
Preferred linkers include, but are not limited to, polymers, such as polynucleotides, polyethylene glycols (PEGs), polysaccharides and polypeptides. These linkers may be linear, branched or circular. For instance, the linker may be a circular polynucleotide. The polynucleotide may hybridise to a complementary sequence on the circular polynucleotide linker.
The one or more anchors or one or more linkers may comprise a component that can be cut or broken down, such as a restriction site or a photolabile group.
Functionalised linkers and the ways in which they can couple molecules are known in the art. For instance, linkers functionalised with maleimide groups will react with and attach to cysteine residues in proteins. In the context of this invention, the protein may be present in the membrane, may be the polynucleotide itself or may be used to couple (or bind) to the polynucleotide. This is discussed in more detail below.
Crosslinkage of polynucleotides can be avoided using a “lock and key” arrangement. Only one end of each linker may react together to form a longer linker and the other ends of the linker each react with the polynucleotide or membrane respectively. Such linkers are described in International Application No. PCT/GB10/000132 (published as WO 2010/086602).
The use of a linker is preferred in the sequencing embodiments discussed below. If a polynucleotide is permanently coupled directly to the membrane in the sense that it does not uncouple when interacting with the pore, then some sequence data will be lost as the sequencing run cannot continue to the end of the polynucleotide due to the distance between the membrane and the pore. If a linker is used, then the polynucleotide can be processed to completion.
The coupling may be permanent or stable. In other words, the coupling may be such that the polynucleotide remains coupled to the membrane when interacting with the pore.
The coupling may be transient. In other words, the coupling may be such that the polynucleotide may decouple from the membrane when interacting with the pore. For certain applications, such as aptamer detection and polynucleotide sequencing, the transient nature of the coupling is preferred. If a permanent or stable linker is attached directly to either the 5′ or 3′ end of a polynucleotide and the linker is shorter than the distance between the membrane and the transmembrane pore's channel, then some sequence data will be lost as the sequencing run cannot continue to the end of the polynucleotide. If the coupling is transient, then when the coupled end randomly becomes free of the membrane, then the polynucleotide can be processed to completion. Chemical groups that form permanent/stable or transient links are discussed in more detail below. The polynucleotide may be transiently coupled to an amphiphilic layer or triblock copolymer membrane using cholesterol or a fatty acyl chain. Any fatty acyl chain having a length of from 6 to 30 carbon atom, such as hexadecanoic acid, may be used.
In preferred embodiments, a polynucleotide, such as a nucleic acid, is coupled to an amphiphilic layer such as a triblock copolymer membrane or lipid bilayer. Coupling of nucleic acids to synthetic lipid bilayers has been carried out previously with various different tethering strategies. These are summarised in Table 3 below.

TABLE 3

Anchor	Type of
comprising	coupling	Reference

Thiol	Stable	Yoshina-Ishii, C. and S. G. Boxer (2003).
		“Arrays of mobile tethered vesicles on
		supported lipid bilayers.” J Am Chem Soc
		125(13): 3696-7.
Biotin	Stable	Nikolov, V., R. Lipowsky, et al. (2007).
		“Behavior of giant vesicles with anchored DNA
		molecules.” Biophys J 92(12): 4356-68
Cholesterol	Transient	Pfeiffer, I. and F. Hook (2004). “Bivalent
		cholesterol-based coupling of oligonucletides to
		lipid membrane assemblies.” J Am Chem Soc
		126(33): 10224-5
Surfactant	Stable	van Lengerich, B., R. J. Rawle, et al.
(e.g. Lipid,		“Covalent attachment of lipid vesicles to a
Palmitate,		fluid-supported bilayer allows observation of
etc)		DNA-mediated vesicle interactions.” Langmuir
		26(11): 8666-72

Synthetic polynucleotides and/or linkers may be functionalised using a modified phosphoramidite in the synthesis reaction, which is easily compatible for the direct addition of suitable anchoring groups, such as cholesterol, tocopherol, palmitate, thiol, lipid and biotin groups. These different attachment chemistries give a suite of options for attachment to polynucleotides. Each different modification group couples the polynucleotide in a slightly different way and coupling is not always permanent so giving different dwell times for the polynucleotide to the membrane. The advantages of transient coupling are discussed above.
Coupling of polynucleotides to a linker or to a functionalised membrane can also be achieved by a number of other means provided that a complementary reactive group or an anchoring group can be added to the polynucleotide. The addition of reactive groups to either end of a polynucleotide has been reported previously. A thiol group can be added to the 5′ of ssDNA or dsDNA using T4 polynucleotide kinase and ATPγS (Grant, G. P. and P. Z. Qin (2007). “A facile method for attaching nitroxide spin labels at the 5′ terminus of nucleic acids.” Nucleic Acids Res 35(10): e77). An azide group can be added to the 5′-phosphate of ssDNA or dsDNA using T4 polynucleotide kinase and γ-[2-Azidoethyl]-ATP or γ-[6-Azidohexyl]-ATP. Using thiol or Click chemistry a tether, containing either a thiol, iodoacetamide OPSS or maleimide group (reactive to thiols) or a DIBO (dibenzocyclooxtyne) or alkyne group (reactive to azides), can be covalently attached to the polynucleotide. A more diverse selection of chemical groups, such as biotin, thiols and fluorophores, can be added using terminal transferase to incorporate modified oligonucleotides to the 3′ of ssDNA (Kumar, A., P. Tchen, et al. (1988). “Nonradioactive labeling of synthetic oligonucleotide probes with terminal deoxynucleotidyl transferase.” Anal Biochem 169(2): 376-82). Streptavidin/biotin and/or streptavidin/desthiobiotin coupling may be used for any other polynucleotide. A polynucleotide can be coupled to a membrane using streptavidin/biotin and streptavidin/desthiobiotin. It may also be possible that anchors may be directly added to polynucleotides using terminal transferase with suitably modified nucleotides (e.g. cholesterol or palmitate).
The one or more anchors preferably couple the polynucleotide to the membrane via hybridisation. The hybridisation may be present in any part of the one or more anchors, such as between the one or more anchors and the polynucleotide, within the one or more anchors or between the one or more anchors and the membrane. Hybridisation in the one or more anchors allows coupling in a transient manner as discussed above. For instance, a linker may comprise two or more polynucleotides, such as 3, 4 or 5 polynucleotides, hybridised together. The one or more anchors may hybridise to the polynucleotide. The one or more anchors may hybridise directly to the polynucleotide, directly to a Y adaptor and/or leader sequence attached to the polynucleotide or directly to a hairpin loop adaptor attached to the polynucleotide (as discussed in more detail below). Alternatively, the one or more anchors may be hybridised to one or more, such as 2 or 3, intermediate polynucleotides (or “splints”) which are hybridised to the polynucleotide, to a Y adaptor and/or leader sequence attached to the polynucleotide or to a hairpin loop adaptor attached to the polynucleotide (as discussed in more detail below).
The one or more anchors may comprise a single stranded or double stranded polynucleotide. One part of the anchor may be ligated to a single stranded or double stranded polynucleotide analyte. Ligation of short pieces of ssDNA have been reported using T4 RNA ligase I (Troutt, A. B., M. G. McHeyzer-Williams, et al. (1992). “Ligation-anchored PCR: a simple amplification technique with single-sided specificity.” Proc Natl Acad Sci USA 89(20): 9823-5). Alternatively, either a single stranded or double stranded polynucleotide can be ligated to a double stranded polynucleotide and then the two strands separated by thermal or chemical denaturation. To a double stranded polynucleotide, it is possible to add either a piece of single stranded polynucleotide to one or both of the ends of the duplex, or a double stranded polynucleotide to one or both ends. For addition of single stranded polynucleotides to the double stranded polynucleotide, this can be achieved using T4 RNA ligase I as for ligation to other regions of single stranded polynucleotides. For addition of double stranded polynucleotides to a double stranded polynucleotide then ligation can be “blunt-ended”, with complementary 3′ dA/dT tails on the polynucleotide and added polynucleotide respectively (as is routinely done for many sample prep applications to prevent concatemer or dimer formation) or using “sticky-ends” generated by restriction digestion of the polynucleotide and ligation of compatible adapters. Then, when the duplex is melted, each single strand will have either a 5′ or 3′ modification if a single stranded polynucleotide was used for ligation or a modification at the 5′ end, the 3′ end or both if a double stranded polynucleotide was used for ligation.
If the polynucleotide is a synthetic strand, the one or more anchors can be incorporated during the chemical synthesis of the polynucleotide. For instance, the polynucleotide can be synthesised using a primer having a reactive group attached to it.
Adenylated polynucleotides are intermediates in ligation reactions, where an adenosine-monophosphate is attached to the 5′-phosphate of the polynucleotide. Various kits are available for generation of this intermediate, such as the 5′ DNA Adenylation Kit from NEB. By substituting ATP in the reaction for a modified nucleotide triphosphate, then addition of reactive groups (such as thiols, amines, biotin, azides, etc) to the 5′ of a polynucleotide can be possible. It may also be possible that anchors could be directly added to polynucleotides using a 5′ DNA adenylation kit with suitably modified nucleotides (e.g. cholesterol or palmitate).
A common technique for the amplification of sections of genomic DNA is using polymerase chain reaction (PCR). Here, using two synthetic oligonucleotide primers, a number of copies of the same section of DNA can be generated, where for each copy the 5′ of each strand in the duplex will be a synthetic polynucleotide. Single or multiple nucleotides can be added to 3′ end of single or double stranded DNA by employing a polymerase. Examples of polymerases which could be used include, but are not limited to, Terminal Transferase, Klenow and E. coli Poly(A) polymerase). By substituting ATP in the reaction for a modified nucleotide triphosphate then anchors, such as cholesterol, thiol, amine, azide, biotin or lipid, can be incorporated into double stranded polynucleotides. Therefore, each copy of the amplified polynucleotide will contain an anchor.
Ideally, the polynucleotide is coupled to the membrane without having to functionalise the polynucleotide. This can be achieved by coupling the one or more anchors, such as a polynucleotide binding protein or a chemical group, to the membrane and allowing the one or more anchors to interact with the polynucleotide or by functionalizing the membrane. The one or more anchors may be coupled to the membrane by any of the methods described herein. In particular, the one or more anchors may comprise one or more linkers, such as maleimide functionalised linkers.
In this embodiment, the polynucleotide is typically RNA, DNA, PNA, TNA or LNA and may be double or single stranded. This embodiment is particularly suited to genomic DNA polynucleotides.
The one or more anchors can comprise any group that couples to, binds to or interacts with single or double stranded polynucleotides, specific nucleotide sequences within the polynucleotide or patterns of modified nucleotides within the polynucleotide, or any other ligand that is present on the polynucleotide.
Suitable binding proteins for use in anchors include, but are not limited to, E. coli single stranded binding protein, P5 single stranded binding protein, T4 gp32 single stranded binding protein, the TOPO V dsDNA binding region, human histone proteins, E. coli HU DNA binding protein and other archaeal, prokaryotic or eukaryotic single stranded or double stranded polynucleotide (or nucleic acid) binding proteins, including those listed below.
The specific nucleotide sequences could be sequences recognised by transcription factors, ribosomes, endonucleases, topoisomerases or replication initiation factors. The patterns of modified nucleotides could be patterns of methylation or damage.
The one or more anchors can comprise any group which couples to, binds to, intercalates with or interacts with a polynucleotide. The group may intercalate or interact with the polynucleotide via electrostatic, hydrogen bonding or Van der Waals interactions. Such groups include a lysine monomer, poly-lysine (which will interact with ssDNA or dsDNA), ethidium bromide (which will intercalate with dsDNA), universal bases or universal nucleotides (which can hybridise with any polynucleotide) and osmium complexes (which can react to methylated bases). A polynucleotide may therefore be coupled to the membrane using one or more universal nucleotides attached to the membrane. Each universal nucleotide may be coupled to the membrane using one or more linkers. The universal nucleotide preferably comprises one of the following nucleobases: hypoxanthine, 4-nitroindole, 5-nitroindole, 6-nitroindole, formylindole, 3-nitropyrrole, nitroimidazole, 4-nitropyrazole, 4-nitrobenzimidazole, 5-nitroindazole, 4-aminobenzimidazole or phenyl (C6-aromatic ring). The universal nucleotide more preferably comprises one of the following nucleosides: 2′-deoxyinosine, inosine, 7-deaza-2′-deoxyinosine, 7-deaza-inosine, 2-aza-deoxyinosine, 2-aza-inosine, 2-0′-methylinosine, 4-nitroindole 2′-deoxyribonucleoside, 4-nitroindole ribonucleoside, 5-nitroindole 2′-deoxyribonucleoside, 5-nitroindole ribonucleoside, 6-nitroindole 2′-deoxyribonucleoside, 6-nitroindole ribonucleoside, 3-nitropyrrole 2′-deoxyribonucleoside, 3-nitropyrrole ribonucleoside, an acyclic sugar analogue of hypoxanthine, nitroimidazole 2′-deoxyribonucleoside, nitroimidazole ribonucleoside, 4-nitropyrazole 2′-deoxyribonucleoside, 4-nitropyrazole ribonucleoside, 4-nitrobenzimidazole 2′-deoxyribonucleoside, 4-nitrobenzimidazole ribonucleoside, 5-nitroindazole 2′-deoxyribonucleoside, 5-nitroindazole ribonucleoside, 4-aminobenzimidazole 2′-deoxyribonucleoside, 4-aminobenzimidazole ribonucleoside, phenyl C-ribonucleoside, phenyl C-2′-deoxyribosyl nucleoside, T-deoxynebularine, T-deoxyisoguanosine, K-2′-deoxyribose, P-2′-deoxyribose and pyrrolidine. The universal nucleotide more preferably comprises 2′-deoxyinosine. The universal nucleotide is more preferably IMP or dIMP. The universal nucleotide is most preferably dPMP (2′-Deoxy-P-nucleoside monophosphate) or dKMP (N6-methoxy-2, 6-diaminopurine monophosphate).
The one or more anchors may couple to (or bind to) the polynucleotide via Hoogsteen hydrogen bonds (where two nucleobases are held together by hydrogen bonds) or reversed Hoogsteen hydrogen bonds (where one nucleobase is rotated through 180° with respect to the other nucleobase). For instance, the one or more anchors may comprise one or more nucleotides, one or more oligonucleotides or one or more polynucleotides which form Hoogsteen hydrogen bonds or reversed Hoogsteen hydrogen bonds with the polynucleotide. These types of hydrogen bonds allow a third polynucleotide strand to wind around a double stranded helix and form a triplex. The one or more anchors may couple to (or bind to) a double stranded polynucleotide by forming a triplex with the double stranded duplex.
In this embodiment at least 1%, at least 10%, at least 25%, at least 50% or 100% of the membrane components may be functionalised.
Where the one or more anchors comprise a protein, they may be able to anchor directly into the membrane without further functonalisation, for example if it already has an external hydrophobic region which is compatible with the membrane. Examples of such proteins include, but are not limited to, transmembrane proteins, intramembrane proteins and membrane proteins. Alternatively the protein may be expressed with a genetically fused hydrophobic region which is compatible with the membrane. Such hydrophobic protein regions are known in the art.
The one or more anchors are preferably mixed with the polynucleotide before delivery to the membrane, but the one or more anchors may be contacted with the membrane and subsequently contacted with the polynucleotide.
In another aspect the polynucleotide may be functionalised, using methods described above, so that it can be recognised by a specific binding group. Specifically the polynucleotide may be functionalised with a ligand such as biotin (for binding to streptavidin), amylose (for binding to maltose binding protein or a fusion protein), Ni-NTA (for binding to poly-histidine or poly-histidine tagged proteins) or peptides (such as an antigen).
According to a preferred embodiment, the one or more anchors may be used to couple a polynucleotide to the membrane when the polynucleotide is attached to a leader sequence which preferentially threads into the pore. Leader sequences are discussed in more detail below. Preferably, the polynucleotide is attached (such as ligated) to a leader sequence which preferentially threads into the pore. Such a leader sequence may comprise a homopolymeric polynucleotide or an abasic region. The leader sequence is typically designed to hybridise to the one or more anchors either directly or via one or more intermediate polynucleotides (or splints). In such instances, the one or more anchors typically comprise a polynucleotide sequence which is complementary to a sequence in the leader sequence or a sequence in the one or more intermediate polynucleotides (or splints). In such instances, the one or more splints typically comprise a polynucleotide sequence which is complementary to a sequence in the leader sequence.
Any of the methods discussed above for coupling polynucleotides to membranes, such as amphiphilic layers, can of course be applied to other polynucleotide and membrane combinations. In some embodiments, an amino acid, peptide, polypeptide or protein is coupled to an amphiphilic layer, such as a triblock copolymer layer or lipid bilayer. Various methodologies for the chemical attachment of such polynucleotides are available. An example of a molecule used in chemical attachment is EDC (1-ethyl-3-[3-dimethylaminopropyl]carbodiimide hydrochloride). Reactive groups can also be added to the 5′ of polynucleotides using commercially available kits (Thermo Pierce, Part No. 22980). Suitable methods include, but are not limited to, transient affinity attachment using histidine residues and Ni-NTA, as well as more robust covalent attachment by reactive cysteines, lysines or non natural amino acids.

Polynucleotide Characterisation

Any number of polynucleotides can be investigated. For instance, the method of the invention may concern characterising two or more polynucleotides, such as 3 or more, 4 or more, or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 20 or more, 30 or more, 50 or more, 100 or more, 500 or more, 1,000 or more, 5,000 or more, 10,000 or more, 100,000 or more, 1000,000 or more or 5000,000 or more, polynucleotides. The two or more polynucleotides may be delivered using the same microparticle or different microparticles.
A microparticle is a microscopic particle whose size is typically measured in micrometres (μm). Microparticles may also known as microspheres or microbeads. The microparticle may be a nanoparticle. A nanoparticle is a microscopic particle whose size is typically measured in nanometres (nm).
A microparticle typically has a particle size of from about 0.001 μm to about 500 μm. For instance, a nanoparticle may have a particle size of from about 0.01 μm to about 200 μm or about 0.1 μm to about 100 μm. More often, a microparticle has a particle size of from about 0.5 μm to about 100 μm, or for instance from about 1 μm to about 50 μm. The microparticle may have a particle size of from about 1 nm to about 1000 nm, such as from about 10 nm to about 500 nm, about 20 nm to about 200 nm or from about 30 nm to about 100 nm.
If two or more polynucleotides are characterised, they may be different from one another. The two or more polynucleotides may be two or more instances of the same polynucleotide. This allows proof reading.
The polynucleotides can be naturally occurring or artificial. For instance, the method may be used to verify the sequence of two or more manufactured oligonucleotides. The methods are typically carried out in vitro.
The method may involve measuring two, three, four or five or more characteristics of each polynucleotide. The one or more characteristics are preferably selected from (i) the length of the polynucleotide, (ii) the identity of the polynucleotide, (iii) the sequence of the polynucleotide, (iv) the secondary structure of the polynucleotide and (v) whether or not the polynucleotide is modified. Any combination of (i) to (v) may be measured in accordance with the invention, such as {i}, {ii}, {iii}, {iv}, {v}, {i,ii}, {i,iii}, {i,iv}, {i,v}, {ii,iii}, {ii,iv}, {ii,v}, {iii,iv}, {iii,v}, {iv,v}, {i,ii,iii}, {i,ii,iv}, {i,ii,v}, {i,iii,iv}, {i,iii,v}, {i,iv,v}, {ii,iii,iv}, {ii,iii,v}, {ii,iv,v}, {iii,iv,v}, {i,ii,iii,iv}, {i,ii,iii,v}, {i,ii,iv,v}, {i,iii,iv,v}, {ii,iii,iv,v} or {i,ii,iii,iv,v}.
For (i), the length of the polynucleotide may be measured for example by determining the number of interactions between the polynucleotide and the pore or the duration of interaction between the polynucleotide and the pore.
For (ii), the identity of the polynucleotide may be measured in a number of ways. The identity of the polynucleotide may be measured in conjunction with measurement of the sequence of the polynucleotide or without measurement of the sequence of the polynucleotide. The former is straightforward; the polynucleotide is sequenced and thereby identified. The latter may be done in several ways. For instance, the presence of a particular motif in the polynucleotide may be measured (without measuring the remaining sequence of the polynucleotide). Alternatively, the measurement of a particular electrical and/or optical signal in the method may identify the polynucleotide as coming from a particular source.
For (iii), the sequence of the polynucleotide can be determined as described previously. Suitable sequencing methods, particularly those using electrical measurements, are described in Stoddart D et al., Proc Natl Acad Sci, 12; 106(19):7702-7, Lieberman K R et al, J Am Chem Soc. 2010; 132(50):17961-72, and International Application WO 2000/28312.
For (iv), the secondary structure may be measured in a variety of ways. For instance, if the method involves an electrical measurement, the secondary structure may be measured using a change in dwell time or a change in current flowing through the pore. This allows regions of single-stranded and double-stranded polynucleotide to be distinguished.
For (v), the presence or absence of any modification may be measured. The method preferably comprises determining whether or not the polynucleotide is modified by methylation, by oxidation, by damage, with one or more proteins or with one or more labels, tags or spacers. Specific modifications will result in specific interactions with the pore which can be measured using the methods described below. For instance, methylcyotsine may be distinguished from cytosine on the basis of the current flowing through the pore during its interaction with each nucleotide.
The methods may be carried out using any apparatus that is suitable for investigating a membrane/pore system in which a pore is present in a membrane. The method may be carried out using any apparatus that is suitable for transmembrane pore sensing. For example, the apparatus comprises a chamber comprising an aqueous solution and a barrier that separates the chamber into two sections. The barrier typically has an aperture in which the membrane containing the pore is formed. Alternatively the barrier forms the membrane in which the pore is present.
The methods may be carried out using the apparatus described in International Application No. PCT/GB08/000562 (WO 2008/102120).
A variety of different types of measurements may be made. This includes without limitation: electrical measurements and optical measurements. A suitable optical method involving the measurement of fluorescence is disclosed by J. Am. Chem. Soc. 2009, 131 1652-1653. Possible electrical measurements include: current measurements, impedance measurements, tunnelling measurements (Ivanov A P et al., Nano Lett. 2011 Jan. 12; 11(1):279-85), and FET measurements (International Application WO 2005/124888). Optical measurements may be combined with electrical measurements (Soni G V et al., Rev Sci Instrum. 2010 January; 81(1):014301). The measurement may be a transmembrane current measurement such as measurement of ionic current flowing through the pore.
Electrical measurements may be made using standard single channel recording equipment as describe in Stoddart D et al., Proc Natl Acad Sci, 12; 106(19):7702-7, Lieberman K R et al, J Am Chem Soc. 2010; 132(50):17961-72, and International Application WO 2000/28312. Alternatively, electrical measurements may be made using a multi-channel system, for example as described in International Application WO 2009/077734 and International Application WO 2011/067559.
The method is preferably carried out with a potential applied across the membrane. The applied potential may be a voltage potential. Alternatively, the applied potential may be a chemical potential. An example of this is using a salt gradient across a membrane, such as an amphiphilic layer. A salt gradient is disclosed in Holden et al., J Am Chem Soc. 2007 Jul. 11; 129(27):8650-5. In some instances, the current passing through the pore as a polynucleotide moves with respect to the pore is used to estimate or determine the sequence of the polynucleotide. This is strand sequencing.
The methods may involve measuring the current passing through the pore as the polynucleotide moves with respect to the pore. Therefore the apparatus may also comprise an electrical circuit capable of applying a potential and measuring an electrical signal across the membrane and pore. The methods may be carried out using a patch clamp or a voltage clamp. The methods preferably involve the use of a voltage clamp.
In a preferred embodiment, the method comprises:

- (a) contacting the/each modified polynucleotide with a transmembrane pore such that at least one strand of the/each polynucleotide moves through the pore; and
- (b) measuring the current passing through the pore as at least one strand of the/each polynucleotide moves with respect to the pore wherein the current is indicative of one or more characteristics of the at least one strand of the/each polynucleotide and thereby characterising the modified/template polynucleotide.

The methods of the invention may involve the measuring of a current passing through the pore as the polynucleotide moves with respect to the pore. Suitable conditions for measuring ionic currents through transmembrane protein pores are known in the art and disclosed in the Example. The method is typically carried out with a voltage applied across the membrane and pore. The voltage used is typically from +5 V to −5 V, such as from +4 V to −4 V, +3 V to −3 V or +2 V to −2 V. The voltage used is typically from −600 mV to +600 mV or −400 mV to +400 mV. The voltage used is preferably in a range having a lower limit selected from −400 mV, −300 mV, −200 mV, −150 mV, −100 mV, −50 mV, −20 mV and 0 mV and an upper limit independently selected from +10 mV, +20 mV, +50 mV, +100 mV, +150 mV, +200 mV, +300 mV and +400 mV. The voltage used is more preferably in the range 100 mV to 240 mV and most preferably in the range of 120 mV to 220 mV. It is possible to increase discrimination between different nucleotides by a pore by using an increased applied potential.
The methods are typically carried out in the presence of any charge carriers, such as metal salts, for example alkali metal salt, halide salts, for example chloride salts, such as alkali metal chloride salt. Charge carriers may include ionic liquids or organic salts, for example tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or 1-ethyl-3-methyl imidazolium chloride. In the exemplary apparatus discussed above, the salt is present in the aqueous solution in the chamber. Potassium chloride (KCl), sodium chloride (NaCl), caesium chloride (CsCl) or a mixture of potassium ferrocyanide and potassium ferricyanide is typically used. KCl, NaCl and a mixture of potassium ferrocyanide and potassium ferricyanide are preferred. The charge carriers may be asymmetric across the membrane. For instance, the type and/or concentration of the charge carriers may be different on each side of the membrane.
The salt concentration may be at saturation. The salt concentration may be 3 M or lower and is typically from 0.1 to 2.5 M, from 0.3 to 1.9 M, from 0.5 to 1.8 M, from 0.7 to 1.7 M, from 0.9 to 1.6 M or from 1 M to 1.4 M. The salt concentration is preferably from 150 mM to 1 M. The method is preferably carried out using a salt concentration of at least 0.3 M, such as at least 0.4 M, at least 0.5 M, at least 0.6 M, at least 0.8 M, at least 1.0 M, at least 1.5 M, at least 2.0 M, at least 2.5 M or at least 3.0 M. High salt concentrations provide a high signal to noise ratio and allow for currents indicative of the presence of a nucleotide to be identified against the background of normal current fluctuations.
The methods are typically carried out in the presence of a buffer. In the exemplary apparatus discussed above, the buffer is present in the aqueous solution in the chamber. Any buffer may be used in the method of the invention. Typically, the buffer is phosphate buffer. Other suitable buffers are HEPES and Tris-HCl buffer. The methods are typically carried out at a pH of from 4.0 to 12.0, from 4.5 to 10.0, from 5.0 to 9.0, from 5.5 to 8.8, from 6.0 to 8.7 or from 7.0 to 8.8 or 7.5 to 8.5. The pH used is preferably about 7.5.
The methods may be carried out at from 0° C. to 100° C., from 15° C. to 95° C., from 16° C. to 90° C., from 17° C. to 85° C., from 18° C. to 80° C., 19° C. to 70° C., or from 20° C. to 60° C. The methods are typically carried out at room temperature. The methods are optionally carried out at a temperature that supports enzyme function, such as about 37° C.

Kits

The present invention also provides a kit for modifying a template polynucleotide. The kit comprises (a) a population of MuA substrates as defined above, (b) a MuA transposase and (c) a translocase. Any of the embodiments discussed above with reference to the methods and products of the invention equally apply to the kits.
The kit may further comprise the components of a membrane, such as the components of an amphiphilic layer or a lipid bilayer. The kit may further comprise the components of a transmembrane pore. The kit may further comprise a molecular brake. Suitable membranes, pores and molecular brakes are discussed above.
The kit may further comprise a Y adaptor comprising a leader sequence and/or one or more anchors capable of coupling the adaptor to a membrane. Suitable Y adaptors, leader sequences and anchors are discussed above.
The kit of the invention may additionally comprise one or more other reagents or instruments which enable any of the embodiments mentioned above to be carried out. Such reagents or instruments include one or more of the following: suitable buffer(s) (aqueous solutions), means to obtain a sample from a subject (such as a vessel or an instrument comprising a needle), means to amplify and/or express polynucleotides, a membrane as defined above or voltage or patch clamp apparatus. Reagents may be present in the kit in a dry state such that a fluid sample resuspends the reagents. The kit may also, optionally, comprise instructions to enable the kit to be used in the method of the invention or details regarding which patients the method may be used for. The kit may, optionally, comprise nucleotides.
The following Example illustrates the invention.

Example 1

MuA binds to the transposon as a tetramer and is extremely stable; remaining tightly bound after strand transfer of the transposon. If the MuA is not removed from the DNA, this can inhibit characterisation using a nanopore system. MuA can be removed by heating to 75° C. However, this relies on the use of a thermal cycler or water bath and could damage other components in the solution. Here we describe an alternative technique for removing MuA without needing to heat the reaction, using a helicase. Hel308Mbu-E284C/S615C-STrEP(C) (SEQ ID NO: with mutations E284C/S615C with a streptavidin tag attached at its C terminus) is a processive helicase which binds to single stranded DNA and moves in a 3′ to 5′ direction. When the transposon has a 3′ overhang on the bottom strand, Hel308Mbu-E284C/S615C-STrEP(C) (SEQ ID NO: 10 with mutations E284C/S615C with a streptavidin tag attached at its C terminus) can bind and, upon moving along the DNA, force the MuA complex to dissociate from the DNA.

Materials and Method

Enzyme Preparation:

Hel308Mbu-E284C/S615C-STrEP(C) (20 uM, SEQ ID NO: 10 with mutations E284C/S615C with a streptavidin tag attached at its C terminus) was reduced using 10 mM DTT in a 2 ml protein low bind Eppendorf and rotated on a Hula shaker (ThermoFisher Scientific) for 1 h, at 10 rpm with no vibration. The enzyme was then buffer exchanged, into 100 mM sodium phosphate, 500 mM NaCl, 5 mM EDTA and 0.1% Tween-20 pH8.0, using Zeba spin desalting columns 7K MWCO, 0.5 ml (ThermoFisher Scientific) according to the manufacturers protocol. The sample was diluted to 10 uM and 50 uM 1,11-bis(maleimido)triethylene glycol was added. The sample was then rotated on a Hula shaker for further 2 hours. This resulted in a closed complex helicase which was able to load onto DNA at the 3′ end.

Adapter Annealing

The sequence for the transposon top strand was (SEQ ID NO: 115). This was annealed with either SEQ ID NO: 116 to form transposon 1 or annealed with SEQ ID NO: 117 to form transposon 2 which has a 3′ overhang on the bottom strand.
The transposon top strand was also annealed with the transposon leader (30 iSpC3 spacers attached at the 3′ end to the 5′ of SEQ ID NO: 118, which is attached at its 3′ end to the 5′ end of four iSp18 spacers which are attached at the 3′ end to the 5′ end of SEQ ID NO: 119).
Transposons (10 uM) were annealed in 50 mM NaCl, 10 mM Tris·HCl pH8.0. The transposon sequences were heated to 95C for 2 minutes and then slow cooled (6 seconds for every 0.1° C. decrease) to 4° C.

Transpososome Formation

Transposon 1, transposon 2 and leader transposon were each mixed to 2 uM in 40 ul, with concentrated MuA transposase (20 ul, 1.1 mg/ml, ThermoFisher Scientific) in 25 mM Tris·HCl pH8, 110 mM NaCl, 0.5 mM EDTA, 10% glycerol and 0.05% Triton-X100. These were then incubated at 30° C. for 90 minutes to form transpososome 1, transpososome 2 and leader transpososome respectively, at 2 uM.

Transposition

Transpososome 1 and transpososome 2 were each mixed to 50 nM with 1.5 ug of PhiX174 RFI DNA (New England Biolabs) in 25 mM Tris·HCl pH8, 110 mM NaCl and 10 mM MgCl₂in a 30 ul reaction in a 0.2 ml PCR tube. Each reaction was incubated at room temperature for 2 minutes before being split in half to form 3 tubes of 10 uls for each. 1 tube of each transpososome was incubated at 75° C. for 5 minutes, 1 tube of each transpososome was left at room temperature for 5 minutes with nothing added. Hel308Mbu-E284C/S615C-STrEP(C) (1 uM) was added to the final tubes along with 10 mM of ATP (Sigma-Aldrich) and incubated at room temperature for 5 minutes. 1 ul of each reaction was then analysed on the Agilent 2100 Bioanalyser 12,000 bp setting, along with 1 ul of unmodified PhiX.

Electrophysiology

A 60 ul sample was made with 1.5 ug of lambda DNA (New England Biolabs) and 120 nM of leader transpososome in 25 mM Tris·HCl pH8, 110 mM NaCl and 10 mM MgCl₂and the sample mixed by inversion. The sample was incubated at room temperature for 10 minutes. The sample was then split into 3 sets of 20 ul reactions. nH20 (4 ul, ThermoFisher Scientific) was added to sample 1 and the sample was heated at 75° C. for 10 minutes. Hel308Mbu-E284C/S615C-STrEP(C) (2 ul, 10 uM) and ATP (2 ul, 100 mM, Sigma-Aldrich) were added to sample 2 and it was incubated at room temperature for 10 minutes. nH20 (4 ul, ThermoFisher Scientific) was added to sample 3 and the sample was incubated at room temperature for 10 minutes. Agencourt AMPure XP SPRI beads (24 ul) were added to each sample (1-3) and the samples were incubated at room temperature for 5 minutes. The samples were then transferred to a magnetic rack and incubated for 2 minutes at room temperature. The supernatant was then removed and discarded from each sample. Buffer was added to each sample (50 ul, 750 mM NaCl, 10% PEG8000 and 50 mM Tris·HCl pH8). The wash buffer was then removed and discarded from each sample. Buffer 1 (6 ul, 10 mM Tris·HCl, 20 mM NaCl) was then to each sample and each samples was then mixed in order to resuspend the beads. Each sample was then spun down and returned to the magnetic rack. 6 ul of each sample was then removed and 1.5 ul of buffer 2 (1 uM of SEQ ID NO: 20 (which has 6 iSp18 spacers attached at its 3′ end), 750 mM KCl, 5 mM EDTA, 125 mM Kpi pH8) was added to each sample. The samples were then incubated at room temperature for 10 minutes. T4 Dda-(E94C/C109A/C136A/A360C) (SEQ ID NO: 97 with mutations E94C/C109A/C136A/A360C and then (ΔM1)G1G2 (where (ΔM1)G1G2=deletion of M1 and then addition G1 and G2), 1.25 ul, 5 uM), 25 mM Potassium phosphate, 150 mM KCl, 5% glycerol, 1 mM EDTA, pH7) was then added to each sample and then each sample was incubated at room temperature for 5 minutes. Buffer (1.25 ul, 800 uM TMAD) was then added to each sample and then each was incubated at room temperature for 5 minutes. Finally, 6 ul of fuel mix (75 mM ATP, 75 mM MgCl₂) and 284 ul of buffer (25 mM Potassium phosphate, 500 mM potassium chloride, pH8) was added to each sample.
Electrical measurements were acquired from single MspA nanopores inserted in block co-polymer in buffer (25 mM K Phosphate buffer, 150 mM Potassium Ferrocyanide (II), 150 mM Potassium Ferricyanide (III), pH 8.0). After achieving a single pore inserted in the block co-polymer, then buffer (2 mL, 25 mM K Phosphate buffer, 150 mM Potassium Ferrocyanide (II), 150 mM Potassium Ferricyanide (III), pH 8.0) was flowed through the system to remove any excess MspA nanopores. 150 uL of 500 mM KCl, 25 mM K Phosphate, pH8.0 was then flowed through the system. After 10 minutes a further 150 uL of the sample described above was then flowed into the single nanopore experimental system. The experiment was run at −140 mV and helicase-controlled DNA movement was monitored.

Results

Agilent Analysis

When the MuA transpososome is not removed from transpososome 1 (FIG. 1 line labelled 1) or tranpososome 2 (FIG. 1 , line labelled 2) e.g. the control where both transpososomes are incubated at room temperature (sample 3), no peak was seen on the trace between the upper marker (labelled Y) and the lower marker (labelled X). This was because the MuA was still bound to the DNA, which prevented both transpososmes (1 and 2) from moving into the gel matrix of the Agilent 2100 Bioanalyser system.
When the sample was heated to 75° C. for 10 minutes, a peak can be seen for both transpososomes (FIG. 2 (transpososome 1) and FIG. 3 (transpososome 2)) between the upper (Y) and lower (X) markers. This represents linearised PhiX with no MuA transposase bound to it.
Treatment with Hel308Mbu-E284C/S615C-STrEP(C) does not result in a PhiX peak for transpososome 1, as there was no 3′ overhang for the enzyme to load onto, so the MuA remained bound (See FIG. 4 ). For transpososome 2, a PhiX peak was seen after addition of Hel308Mbu-E284C/S615C-STrEP(C) because transpososome 2 had a 3′ overhang for the enzyme to load onto (See FIG. 5 ). This indicated the fact that Hel308 was able to successfully remove MuA transposase from transposons.
FIG. 6 shows transpososome 2 after treatment with Hel308Mbu-E284C/S615C-STrEP(C) and heat treatment. The two PhiX peaks are of a similar height, indicating that Hel308 was just as efficient as heat at removing MuA transposase.

Electrophysiology Analysis

Electrophysiology experiments were carried out as described above and the throughput of the experiments were compared (kilobases/per nanopore/hour) for sample 3 (incubation at room temp in absence of Hel308Mbu-E284C/S615C-STrEP(C)), sample 2 (incubation at 75° C. for 10 minutes) and sample 1 (incubation at room temperature with Hel308Mbu-E284C/S615C-STrEP(C) using transpososome with 3′ overhang). FIG. 11 shows a graph of throughput for samples 1-3. Sample 3 shows a throughput of around 20 kb/nanopore/hr which is significantly lower than samples 1 and 2 showing that by not removing the MuA transposase characterisation using a nanopore system was inhibited. Sample 2 (heat treatment) and sample 3 produce much higher throughput values around 80 kb/nanopore/hr for sample 2 and 85 kb/nanopore/hr for sample 3. This shows that removal of MuA transposase using Hel308Mbu-E284C/S615C-STrEP(C) was as efficient as heat treatment. Removal of MuA transposase using Hel308Mbu-E284C/S615C-STrEP(C) resulted in improved characterisation using a nanopore system.

Example 2

This example describes using a number of different translocases to remove MuA transposase.

Materials and Methods

Adapter Annealing

A MuA adapter consisting of SEQ ID NO: 117 and 121 were annealed to 10 uM in 10 mM Tris-HCl (pH 7.5), 50 mM NaCl, from 95° C. to 22° C. at 2° C. per minute. This adapter contained the minimal MuA recognition sequence, with the pre-formed 5′ bottom strand flap, as well as a 12 nt 5′ tail on the top strand and a 10 nt 3′ tail on the bottom strand.

Transpososome Formation

A transposome complex was formed but addition of 1 ul of the MuA adapter, 4.5 ul of nuclease free water, 2 ul of 5× transposome buffer (125 mM Tris pH 8, 550 mM NaCl, 2.5 mM EDTA, 50% glycerol, 0.25% Triton-X100) and 2.5 ul of concentrated MuA transposase (Thermofisher). The mixture was then incubated at 30° C. for 1.5 hours.

Transposition

A transposition reaction, containing 10 ul of 5× transposase buffer (125 mM Tris pH 8, 550 mM NaCl, 50 mM MgCl2), 5 ul transposome, 2.5 ug PhiX RFI (NEB) and nuclease free water to 50 ul, was then carried out at room temperature for 10 minutes. After 10 mins 6.25 ul of 100 mM rATP was added and the reaction was split into 5×11.25 ul. To sample (i) and (ii) 1.25 ul of nuclease free water was added; to sample (iii) 1.25 ul of Hel308Mbu-E284C-STrEP(C) (SEQ ID NO: 10 with mutation E284C with a streptavidin tag attached at its C terminus) was added; to sample (iv) 1.25 ul of T4 Dda-(E94C/F98W/C109A/C136A/A360C) (SEQ ID NO: 97 with mutations E94C/F98W/C109A/C136A/A360C and then (ΔM1)G1G2 (where (ΔM1)G1G2=deletion of M1 and then addition G1 and G2), was added; to sample (v) 1.25 ul of UvrD Eco-(E117C/M380C)-STrEP (SEQ ID NO: 122 with mutations E177C/M380C with a streptavidin tag attached at the C terminus). Samples (i), (iii), (iv) and (v) were then left at room temperature for 10 mins while sample (ii) was left at 75° C. for 10 mins. All samples were then loaded onto a 12000 Agilent DNA chip to look for Tagementation products.

Results

FIGS. 7 to 10 show a number of Agilent traces for samples (i)-(v). Sample (i) was a control where no translocase was added and the sample was no heated. FIGS. 7 to 10 all illustrate the control showing no tagmentation peak was observed for this sample this was because the MuA was still bound to the DNA, which prevented the transpososome from moving into the gel matrix of the Agilent 2100 Bioanalyser system. FIG. 7 also shows sample (ii) (line 2) which shows a clear tagmentation peak when the sample was heated to 75° C. in order to remove the MuA transposase.
FIG. 8 shows sample (iii, line 3) and the control sample (i, line 1). Sample (iii) shows a clear tagmentation peak when the sample was heated with Hel308Mbu-E284C-STrEP(C) in order to remove the MuA transposase. This indicated the fact that Hel308Mbu-E284C-STrEP(C) was able to successfully remove MuA transposase from transposons.
FIG. 9 shows sample (iv, line 4) and the control sample (i, line 1). Sample (iv) shows a clear tagmentation peak when the sample was heated with T4 Dda-(E94C/F98W/C109A/C136A/A360C) in order to remove the MuA transposase. This indicated the fact that T4 Dda-(E94C/F98W/C109A/C136A/A360C) was able to successfully remove MuA transposase from transposons.
FIG. 10 shows sample (v, line 5) and the control sample (i, line 1). Sample (v) shows a clear tagmentation peak when the sample was heated with UvrD Eco-(E117C/M380C)-STrEP in order to remove the MuA transposase. This indicated the fact that UvrD Eco-(E117C/M380C)-STrEP was able to successfully remove MuA transposase from transposons.

Claims

1. A method for modifying a template double stranded polynucleotide, comprising:

(a) contacting the template polynucleotide with a MuA transposase and a population of double stranded MuA substrates each comprising an overhang at one or both ends of one strand such that the transposase fragments the template polynucleotide and ligates a substrate to one or both ends of the double stranded fragments and thereby producing a plurality of fragment/substrate constructs; and

(b) using a translocase to remove the MuA transposases from the constructs and thereby producing a plurality of modified double stranded polynucleotides.

2. A method according to claim 1, wherein the translocase is contacted with the constructs after they are created by the MuA transposase.

3. A method according to claim 1, wherein the translocase is bound to the substrates before the substrates are contacted with the template polynucleotide.

4. A method according to claim 1, wherein the translocase is a helicase.

5. A method according to claim 4, wherein the helicase is from superfamily 1 or superfamily 2; optionally wherein the helicase is a member of one of the following families: Pif1-like, Upf1-like, UvrD/Rep, Ski-like, Rad3/XPD, NS3/NPH-II, DEAD, DEAH/RHA, RecG-like, REcQ-like, T1R-like, Swi/Snf-like and Rig-I-like; or wherein the helicase is a UvrD helicase, a Hel308 helicase, a TraI helicase, a TraI subgroup helicase, an XPD helicase or a Dda helicase.

6-7. (canceled)

8. A method according to claim 4, wherein the helicase is a Hel308 helicase; optionally wherein the Hel308 helicase is Hel308 Mbu (E284C/S615C)-bismaleimidePEG11 (SEQ ID NO: 10 with mutations E284C/S615C connected by a bismaleimidePEG11 linker).

9. (canceled)

10. A method according to claim 1, wherein the translocase is a strippase; optionally wherein the strippase is the INO80 chromatin remodeling complex or a FtsK/SpoIIIE transporter.

11. (canceled)

12. A method according to claim 1, wherein the two strands of each construct are linked at one end by a hairpin loop.

13. A method according to claim 1, wherein the method further comprises attaching molecular brakes to the other strands in the substrates.

14. A method according to claim 13, wherein the molecular brakes are attached to the other strands in the substrates before they are contacted with the template polynucleotide and the MuA transposase.

15. A method according to claim 13, wherein the molecular brakes are attached to the other strands from the substrates remaining in the constructs after they are created by the MuA transposase.

16. A method according to claim 13, wherein the molecular brakes are bound to Y adaptors comprising a leader sequence and/or one or more anchors capable of coupling the adaptor to a membrane and the Y adaptors are attached to the other strands in step (c).

17. A method according to claim 13, wherein the molecular brakes are derived from a polymerase, a helicase or an exonuclease.

18. (canceled)

19. A population of double stranded MuA substrates for modifying a template polynucleotide, wherein each substrate comprises an overhang at one or both ends of one strand and a translocase bound to an overhang.

20. A plurality of polynucleotides modified using a method according to claim 1.

21. A method of characterising at least one polynucleotide modified using a method according to claim 1, comprising:

a) contacting the modified polynucleotide with a transmembrane pore such that at least one strand of the polynucleotide moves through the pore; and

b) taking one or more measurements which are indicative of one or more characteristics of the polynucleotide as the at least one strand moves with respect to the pore and thereby characterising the modified polynucleotide.

22. A method of characterising a template polynucleotide, comprising:

a) modifying the template polynucleotide using a method according to claim 1 to produce a plurality of modified polynucleotides;

b) contacting each modified polynucleotide with a transmembrane pore such that at least one strand of each polynucleotide moves through the pore; and

c) taking one or more measurements which are indicative of one or more characteristics of the polynucleotide as the at least one strand of each polynucleotide moves with respect to the pore and thereby characterising the template polynucleotide.

23. A method according to claim 21, wherein the one or more characteristics are selected from (i) the length of the polynucleotide, (ii) the identity of the polynucleotide, (iii) the sequence of the polynucleotide, (iv) the secondary structure of the polynucleotide and (v) whether or not the polynucleotide is modified.

24. A method according to claim 21, wherein the method comprises measuring the current passing through the pore as the at least one strand or each polynucleotide moves with respect to the pore.

25. A kit for modifying a template polynucleotide comprising (a) a population of MuA substrates as defined in claim 1, (b) a MuA transposase and (c) a translocase; optionally wherein the kit further comprises a polynucleotide protein and/or a Y adaptor comprising a leader sequence and/or one or more anchors capable of coupling the adaptor to a membrane.

26. (canceled)