Nucleic Acid Analysis
The present invention relates to nucleic acid analysis and in particular, but not exclusively, computational aspects of nucleic acid analysis.
There are several techniques which use the hybridisation of a sequence-specific binding molecule, such as a single-stranded DNA molecule, with a complementary nucleic acid sequence to detect and quantify the nucleic acid sequence. These include producing DNA probes which, prior to hybridisation, are labelled to enable subsequent detection. One labelling strategy is the incorporation of P-labelled nucleotides, for example by nick translation, primer extension or end filling. An alternative is the incorporation of biotinylated, enzyme-conjugated or fluorescently-labelled nucleotides into the DNA probe. Autoradiography is used to detect the P radio-labelled probes; a problem with this labelling approach is the potentially hazardous use of radioactive material. Alkaline phosphatase or peroxide-conjugated antibodies and detection using a chromogenic reagent may used to detect the alternatively labelled probes, although such approaches require several time-consuming incubation steps.
Conventional labels are moreover difficult to differentiate in large numbers; there is only a limited number of fluorophores or radiolabels which can be used to differentially detect different probes in a pooled detection reaction. A partial solution to this problem is provided by the use of arraying techniques; however, arrays have their own disadvantages, in that they are expensive to produce and analyse, and must be specifically adapted for each individual requirement.
Mass spectrometry is an analytical spectroscopic tool primarily concerned with the separation of molecular and atomic species according to their mass and can be used in the analysis of many types of sample from elemental to large proteins and polymers. Mass spectrometry is used to measure the molecular mass of a molecule by determining the molecule's flight path through a set of magnetic and electric fields.
The use of mass spectrometry, specifically matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), is known for the separation of heterooligomer fragments with unique molecular weights, in particular for the quantitative analysis of single-nucleotide polymorphisms (SNPs). Captured, single- stranded DNA molecules are prepared by PCR amplification and are hybridised with protein nucleic acid (PNA) probes in an allele-specific fashion. MALDI-TOF MS is then used to detect the PNA probes and identify the SNP. This spectrometry facilitates a rapid, precise and unambiguous analysis, and also has advantages over alternative analytical techniques such as chip-based array strategies. However, the inclusion of an amplification step is problematic since it is a time-consuming process and it is generally acknowledged that amplification under-represents rare sequences. Further, probes have only been developed for very specific roles.
Summary of the Invention
The present invention recognises that mass analysis to identify sequence-specific binding molecules has the potential to be developed into a very powerful technique. At present no methodology exists to systematically construct a set, or repertoire, of sequence-specific binding molecules which are differentiable by mass. The present invention provides a method for constructing such a repertoire.
According to an aspect of the present invention, there is provided a method for constructing a repertoire of oligomers differentiable by mass, comprising: a) providing a heterogeneous pool of monomers, wherein said monomers are modified by addition of one or more of a selection of mass labels; b) optionally, providing a heterogeneous pool of unlabelled monomers; c) determining the monomer sequences of the oligomers to be represented in the repertoire and calculating the number and nature of the mass labels to be incorporated into each monomer such that each oligomer differs in mass; and d) assembling a plurality of labelled monomers and, optionally, one or more unlabelled monomers, to form the oligomers.
Advantageously, a set of oligomers is constructed which can be used to analyse nucleic acid. The repertoire is constructed so that each oligomer with a different sequence has a different mass characteristic. The members of the repertoire which hybridised to the nucleic acid can then be identified by a mass analysis. Applications include gene expression analysis, sequencing, genotyping, discovery, differential display, nucleic acid diagnostics, nucleic acid sequencing, mutation detection analysis, polymorphism analysis and the analysis of variation within species from microorganisms to plants to humans.
The oligomers may be any molecules which can bind to nucleic acids in a sequence- specific manner. Preferably the oligomers are oligopeptides or oligonucleotides which may comprise derivatised nucleotides comprising one or more modifying groups. In preferred embodiments the modifying groups comprise terminus modifications, backbone modifications and/or base modifications. In some embodiments the oligonucleotides comprise PNAs.
In another aspect, the invention provides a method for analysing nucleic acid in a biological sample, comprising the steps of: a) immobilising the nucleic acid(s) in the sample onto a solid support; b) hybridising to the nucleic acid(s) at a desired stringency a repertoire of oligonucleotides, and eluting those members of the repertoire which do not hybridise at the desired stringency; c) eluting the repertoire members hybridised in step (b) and analysing said members to resolve their mass.
A powerful technique to detect and quantify nucleic acid sequences based on the identification of oligomers according to their mass is provided. The technique does not suffer from the disadvantages associated with 32P-labelling or forming biotinylated or fluorescein-conjugated probes and when coupled with a mass spectrometric analysis gives rapid, precise and unambiguous results.
The nucleic acid in the biological sample may be DNA or RNA including mRNA and the biological sample may comprise biological tissue or purified nucleic acid. Preferably the nucleic acid is not amplified. Advantageously, the problems associated with amplification are avoided. The invention is particularly applicable to the detection and quantification of RNA in biological samples, including cells and tissues.
In a further aspect there is provided a method for analysing nucleic acid in a biological sample, comprising the steps of: a) immobilising the nucleic acid(s) in the sample onto a solid support; b) binding to the nucleic acid(s) under desired conditions a repertoire of oligopeptides, and eluting those members of the repertoire which do not bind with the desired affinity; c) eluting the repertoire members bound in step (b) and analysing said members to resolve their mass.
Preferably the analysis to resolve the mass of the members is performed using mass spectrometry, such as MALDI-TOF mass spectrometry.
According to another aspect of the invention, there is provided a method for calculating the number and nature of mass labels to be incorporated into each monomer of each oligomer to be represented in a repertoire, the method comprising, for each oligomer, determining whether the oligomer is to be labelled and, if so: adding the molecular weight of one or more mass labels to the molecular weight of the oligomer to give a combined molecular weight which is unique within the repertoire; and assigning said one or more mass labels to one or more of the monomers in the oligomer.
Brief Description of the Drawings
Figures 1 A to IF illustrate the construction of an oligomer; Figures 2A to 2D illustrate four oligomers;
Figures 3 A to 3D illustrate the four oligomers constructed in accordance with an embodiment of the invention;
Figure 4 shows an example of a repertoire, constructed in accordance with an embodiment of the invention; Figure 5 A shows a table of input data;
Figure 5B shows a table of output data;
Figure 6 is a flow diagram of the operation of a method, in accordance with an embodiment of the invention, for calculating the number and nature of mass labels to be incorporated in each monomer of each oligomer; Figure 7 is a flow diagram of the operation of an embodiment of the method;
Figure 8 shows a table of data for use in an embodiment of the method;
Figure 9 shows another table for use in an embodiment of the method;
Figures 10, 1 1 and 12 are flow diagrams of the operation of embodiments of the method; Figure 13 is a block diagram illustrating the components of a data processing apparatus in the form of a computer system; and
Figures 14 and 15 are block diagrams illustrating components of systems for performing the invention.
Figure 16 is a photograph of a gel with hybridised oligonucleotides. from right to left: beta oligo hybridised alone (2 lanes); gamma oligo hybridised alone (2 lanes); beta gamma mixture (0.5 micrograms RNA on slide); beta gamma mixture (2.5 micrograms
RNA on slide).
Figure 17 is the output of a mass spectrometer as used in Example 1.
Detailed Description of the Invention
The term "oligomer" is used to describe any molecule which is formed of a monomeric units and which can bind to nucleic acid in a sequence-specific manner. The term is used herein to mean a relatively small polymer of complexity greater than that of a monomer, although in the context of the present invention the precise size is not important. Further, the term is used to describe both hetero- and homo- oligomers. "Oligonucleotide" is used as a term to describe an oligomer comprising nucleotides
10
psoralen, rhodamine, ROX, SH(thiol), spacers, TAMRA,TET, Texas Red, and thiol (SH); backbone modifications include Methylphosphonate(MP), 2'-OMe- methylphosphonate RNA, phosphorothioate(PS), RNA, and 2'-OMe RNA; and base modifications include 2-amino-dA, 2-aminopurine, 3'-(ddA), 3'-dA (cordycepin), 7- deaza-dA, 8-Br-dA, 8-oxo-dA, N6-Me-dA, abasic site (dSpacer), Biotin dT, 2'-OMe-5- Me-C, 2'-OMe-propynyl-C, 3'-(5-Me-dC), 3'-(ddC), 5-Br-dC, 5-I-dC, 5-Me-dC, 5-F- dC, carboxy-dT, convertible dA, convertible dC, convertible dG, convertible dT, convertible dU, 7-deaza-dG, 8-Br-dG, 8-oxo-dG, O6-Me-dG, S6-DNP-dG, 4-methyl- indole, 5-nitroindole, 2'-OMe-inosine, 2'-dI, O6-phenyl-dI, 4-methyl-indole, 2'- deoxynebularine, 5-nitroindole, 2-aminopurine, dP(purine analogue), dK(pyrimidine analogue), 3-nitropyrrole, dSpacer (abasic site), 2-thio-dT, 4-thio-dT, biotin-dT, carboxy-dT, O4-Me-dT, O4-triazol dT, 2'-OMe-propynyl-U, 5-Br-dU, 2'-dU, 5-F-dU, 5-I-dU and O4-triazol dU. For the purposes of the present invention, it is to be understood that any modifying group available in the art may be used.
In the schematic representation given in Figures 3A to 3D, only one nucleotide in each of the oligonucleotides bears a modifying group. However, in accordance with the present invention, oligomers may be mass labelled at more than one nucleotide. For instance, oligomers may be mass labelled at more than one but not all of the nucleotides or oligomers may be mass labelled at all of the nucleotides. Where more than one modifying group is incorporated in an oligomer, a different modifying group or the same modifying group may be used on each nucleotide. In some instances more than one modifying group may be attached to the same nucleotide. Further, in certain cases no modifying groups will be required. Figure 4 illustrates a selection of such oligomers which, provided each oligomer has a unique molecular weight within the selection, represents a repertoire in accordance with the invention. Such a repertoire may include any number of oligomers with any number of nucleotides and any number or combination of modifying groups. Although Figure 4 illustrates a selection of oligomers each with the same number of nucleotides, a repertoire can include oligomers with different numbers of nucleotides. For instance, a repertoire in accordance with the invention may comprise some hexamers, some 9-mers, some 12- mers and so on.
and/or other similar monomeric units and encompasses molecules such as protein nucleic acids. "Nucleic acid", as used herein, refers to natural or synthetic nucleic acid or nucleic acid analogues, including RNA or DNA. "Mass spectrometry", in its broadest sense, is used to denote a technique which can distinguish between atomic or molecular species according to their mass. An example is MALDI Mass Spectrometry of which one particular example is MALDI-TOF Mass Spectrometry.
The invention will be described for a repertoire of sequence-specific binding molecules in the form of oligonucleotides. Those skilled in the art will appreciate that other molecules are suitable for performing the invention. For example, oligopeptides can also be constructed and used in accordance with the invention.
Nucleic acids
Oligomers in accordance with the present invention may be constructed from any monomeric units, but are preferably nucleic acid oligomers. A nucleic acid, as referred to herein, may be any nucleic acid, including DNA and RNA, as well as synthetic nucleic acid homologues such as backbone-modified nucleic acids including methylphosphonates, phosphorothioates and phosphorodithioates, where both of the non-bridging oxygens are substituted with sulphur; phosphoroamidites; alkyl phosphotriesters and boranophosphates. Achiral phosphate derivatives include S'-O'-S'-S-phosphorothioate, 3'-S-5'-O-phosphorothioate, 3'-CH2-5'-O-phosphonate and 3'-NH-5'-O-phosphoroamidate. Peptide nucleic acids replace the entire phosphodiester backbone with a peptide linkage.
Sugar modifications are also used to enhance stability and affinity. The α-anomer of deoxyribose may be used, where the base is inverted with respect to the natural β-anomer. The 2'-OH of the ribose sugar may be altered to form 2'-O-methyl or 2'-O-allyl sugars, which provides resistance to degradation without comprising affinity.
Modification of the heterocyclic bases must maintain proper base pairing. Some useful substitutions include deoxyuridine for deoxythymidine; 5 -methyl -2'-deoxycytidine and 5-bromo-2'-deoxycytidine for deoxycytidine. 5-propynyl-2'-deoxyuridine and 5-propynyl-2'-deoxycytidine have been shown to increase affinity and biological activity when substituted for deoxythymidine and deoxycytidine, respectively.
A ribonucleic acid, as referred to herein, may be natural or modified RNA. Advantageously, the RNA may comprise one or more of the modifications identified above.
Oligonucleotide Synthesis
Oligonucleotides with defined sequences can be synthesised chemically. Generally, the synthesis is based on the ability to protect either the 5' or 3' end of a mono or oligonucleotide. For example, an oligonucleotide can be synthesised by solid-phase phosphoamidite chemistry. The 3' end of a mononucleotide containing a first base is attached to an inert support in a reaction vessel and the oligonucleotide is built up one nucleotide at a time from 3' to 5' by a cyclic process. A 5 '-protected nucleotide precursor containing a second base is added to the reaction vessel and the 5' hydroxyl of the first base reacts with the 3' phosphorous of the second base. The resulting phosphite is oxidised to the stable phosphate and the protecting group used to protect the 5' end of the added nucleotide is removed. An oligonucleotide containing two bases is thus produced. The cycle can then be repeated to add a third nucleotide containing a third base and then a fourth nucleotide containing a fourth base, and so on.
Figures 1 A- IF illustrate how an oligonucleotide, here a hexamer, is constructed using such a cyclic process. As shown in Figure 1A, the first mononucleotide containing a first base, an adenine (A), is attached to an inert support 10. A second mononucleotide, also containing an adenine (A), is added by a single iteration of the cycle to arrive at the structure illustrated in Figure IB. Further single iterations of the cycle are completed to add, in turn, third, fourth, fifth and sixth mononucleotides
8
containing a guanine (G), a guanine (G), a cytosine (C) and a thymine (T) respectively. The structure after each iteration of the cycle is shown in Figures 1C, ID, IE and IF, Figure IF representing the constructed hexamer.
Other chemical methods for the synthesis of oligonucleotides are known in the art and include triester, phosphite and H-phosphite methods, PCR and other autoprimer methods as well as oligonucleotide synthesis on solid supports. It will be appreciated that an oligonucleotide, or other oligomer, can be constructed by adding more than one monomer at a time.
Given the guidance provided herein, the repertoire of oligomers of the invention are obtainable according to methods known in the art. Oligonucleotide synthesis can be performed using a programmable synthesis machine, such as an Applied Biosystems ABI 381A.
The invention will now be described for the synthesis of a repertoire of four oligonucleotide hexamers. The four hexamers have the sequences shown in Figures 2A to 2D and have been artificially selected, for the purposes of this description, to each have two adenine bases (A), two guanine bases (G), one cytosine base (C) and one thymine base (T). The first hexamer (Figure 2 A) has the sequence 3'-AAGGCT- 5', the second hexamer (Figure 2B) has the sequence 3'-AGAGTC-5\ the third (Figure 2C) 3'-TCAGGA-5' and the fourth (Figure 2D) 3'-GACTAG-5'. In their unlabelled state these hexamers have identical molecular weights and would be unresolveable by mass spectrometry.
In accordance with the present invention, each hexamer is ascribed a unique mass by selective mass labelling. This selective mass labelling is performed by incorporating one or more derivatised nucleotides in the hexamer, each derivatised nucleotide having one or more modifying groups. A simple representation of the selectively mass labelled hexamers is given in Figures 3A to 3D. In this simple representation each hexamer is labelled in only one position, selected arbitrarily. The first hexamer (Figure 3 A) is labelled by incorporating a derivatised nucleotide in the second position
(represented as Modi-A). The second hexamer is labelled by similarly incorporating a derivatised nucleotide, arbitrarily here at the fourth position and represented as Mod2- G in Figure 3B. The third and fourth mass labelled hexamers are shown in Figures 3C and 3D, each with a derivatised nucleotide in the third position represented as Mod3-A and Mod4-C respectively. Mod]-, Mod2-, Mod3- and Mod4- each represent a modifying group held on the respective nucleotide. Importantly, Modi-, Mod2-, Mod3- and Mod - are selected so that the mass-labelled hexamers are differentiable by mass and in this case, since each hexamer comprises a different sequence of the same set of bases and therefore in their unlabelled states have the same molecular weight, Mod)-, Mod2-, Mod3- and Mod - are selected to themselves have different molecular weights. Thus a repertoire of four oligonucleotides resolvable by mass analysis, for instance mass spectrometry, is established.
To synthesise these mass labelled hexamers, a chemical synthesis corresponding to the one described above in relation to Figures 1A to IF is used. These Figures illustrate the process to form the first non mass labelled hexamer of Figure 2A. To produce the first mass labelled hexamer represented in Figure 3A, an identical process is used except, in the iteration of the cycle resulting in the structure shown in Figure IB, a derivatised nucleotide precursor containing an adenine (A) is used instead of a non- derivatised nucleotide precursor containing an adenine (A). In this example the derivatised nucleotide precursor bears the modifying group Modj.
Mod]-, Mod2-, Mod3- and Mod - represent any modification which changes the molecular weight of the associated nucleotide. Although such a modification may be incorporated through isotopic labelling, Modj-, Mod -, Mod3- and Mod - preferably represent a modifying group, in particular a chemical group, attached to the nucleotide. As any chemical group will have an associated atomic weight, a vast array of possible modifying groups are envisaged. Particular modifying groups suitable for mass labelling of oligonucleotides include terminus modifications, backbone modifications and base modifications. Terminus modifications include acridine, amine (NH2), biotin-TEG, cascade Blue, cholesterol, Cy3, Cy5, Dabcyl, digoxigenin, DNP (dinitrophenyl), Edans, 6-FAM, fluorescein, 3'-glyceryl, HEX, JOE, PO4 (phosphate),
The repertoire will be selected for a particular application; applications include gene expression analysis, sequencing, genotyping, discovery, differential display, DNA diagnostics. DNA sequencing, mutation detection analysis, polymorphism analysis and the analysis of variation within species from micro-organisms to plants to humans.
DNA Probes and Mass Spectrometry
An oligonucleotide can be designed and synthesised to hybridise selectively with a target sequence of a nucleic acid and can therefore act as a probe. If the target sequence is present on the nucleic acid the oligonucleotide will hybridise, otherwise no hybridisation will occur. A repertoire of oligonucleotide probes, each with a unique molecular weight, can be constructed in accordance with the invention. Each probe can be designed to hybridise with a particular target sequence of nucleic acid. In general the oligomers as probes should be of sufficient length and sufficiently unambiguous so that false positive results are minimised.
In accordance with an aspect of the present invention, a repertoire of oligonucleotides is used for analysing nucleic acid. The analysis is a direct analysis and no amplification step is required. The nucleic acid is immobilised onto a solid support in its purified form or its native state, for example as a piece of tissue or cultured cells. The repertoire of oligonucleotides is hybridised to the nucleic acid under a desired stringency and those members of the repertoire which do not hybridise at the desired stringency are eluted. The repertoire members which hybridise are eluted and, each having a unique molecular weight, are analysed to resolve their mass and in turn their identity and quantity. The mass resolution can be performed using a mass spectrometer and such an analysis can be used to determine the target sequences present, or indeed absent, on the nucleic acid.
One particularly suitable form of mass spectrometry for use with an aspect of the present invention is matrix-assisted laser desorption/ionization time-of-fiight (MALDI- TOF) mass spectrometry using, for instance, the PE Corporation/ Applied Biosystems
Voyager™ DE MALDI-TOF Mass Spectrometer. A description of this mass spectrometer can be found in the product-related literature and a description of the use of MALDI-TOF mass spectrometry is given in "Genetic analysis by peptide nucleic acid affinity MALDI-TOF mass spectrometry", Griffin, Tang and Smith, Nature Biotechnology Volume 15 December 1997, incorporated herein by reference. In this paper, the technique is used for the detection of point mutations in a target sequence of a nucleic acid. A biotinylated nucleic acid is immobilised on streptavin-coated magnetic beads and a pair of mass labelled PNA probes are hybridised to the nucleic acid. After washing, only one of the PNA probes (corresponding to the precise sequence) remains on the target. The beads are applied to the MALDI probe tip and the matrix is added and allowed to crystallise, dissociating the hybridised probe from the target. Matrix-assisted laser desorption/ionisation of the dissociated PNA probe is performed, with the DNA target remaining immobilised on the probe tip, and the desorbed PNA probe is detected by time-of-fiight mass spectrometry. The resulting mass spectrum contains a peak at a mass-to-charge corresponding to the hybridised PNA probe.
This type of analysis is suitable for the analysis of the present invention. However, in the present invention a large number of oligomers may be and is intended to be included in the repertoire of oligomers. Consequently, the monomer sequences of the oligomers to be represented in the repertoire have to be determined and the number and nature of the mass labels to be incorporated into each oligomer have to be calculated such that each oligomer differs in resolvable mass. In addition the resulting mass spectrum has to be analysed. The present invention recognises that a computational approach is suitable. Methods for performing these steps, in particular calculating the number and nature of the mass labels for each oligomer, are set out in the section below, entitled "Computational Aspects".
The use of MALDI-TOF spectrometry is also described in "Quantitative Approach to Single-Nucleotide Polymorphism Analysis Using MALDI-TOF Mass Spectrometry",
Ross, Hall and Haff, BioTechniques 29:620-629 (September 2000); "MALDI-TOF based mutation detection using tagged in vitro synthesised peptides", Garvin, Parker
and Haff, Nature Biotechnology Volume 18 January 2000; "High level multiplex genotyping by MALDI-TOF mass spectrometry", Ross, Hall, Smirnov and Haff, Nature Biotechnology Volume 16 December 1998; and "Discrimination of Single- Nucleotide Polymorphisms in Human DNA Using Peptide Nucleic Acid Probes Detected by MALDI-TOF Mass Spectrometry", Ross, Lee and Belgrader, Anal. Chem. 1997, 69, 6197-4202. These documents are incorporated herein by reference.
Computational Aspects
An aspect of the present invention provides a method for calculating the number and nature of mass labels to be incorporated in each monomer of each oligomer. In a preferred embodiment this method is implemented using a computer or other software- controlled programmable processing device, although non-computer implementations are envisaged. The method will be described, by way of example, using the flow diagrams of Figures 6, 7 and 10 to 12. Although those skilled in the art will be able to implement the routines described in a number of ways, the data tables of Figures 5A and 5B and Figures 8 and 9 will be used to assist the description. In a preferred embodiment these data tables are implemented as data files, however any data structures and/or data manipulation techniques known in the art may be applied. The abbreviation "m.w." is used for molecular weight in some of the Figures.
Referring now to Figure 5A, a set of oligomer data is shown. This data can be represented in a number of ways and contains a representation of each of the oligomers to be included in the repertoire. In this table, the molecular weight of each oligomer is also presented, although this can be calculated from the sequence of each oligomer. Figure 5B shows a table for receiving data from the routines described in the flow diagrams of Figures 6, 7 and 10 to 12. In the table, entries are available for the sequence data for each oligomer and for the combined molecular weight of the oligomer and any modifying groups assigned to the oligomer along with an entry for data representing the modifying group and its position. In this example, since the oligomers are hexamers there are six data fields available for the modifying groups; an entry in the first field e.g. "Modi" would represent the modifying group "Modi"
assigned to the first monomer in the hexamer. An entry in the third field e.g. "Mod2" would represent the modifying group "Mod " assigned to the third monomer in the hexamer, and so on. A representation which can record the oligomer sequence, the molecular weight and the nature and position of each modifying group on each oligomer is thus provided. Many equivalent tables are envisaged. In one embodiment the input table of Figure 5A and output table of Figure 5B are combined as a single data table.
Referring now to Figure 6, data representing a first oligomer, to be represented in the repertoire is read from the input table, or oligomer data table, of Figure 5 A. At step
100 it is determined whether or not the oligomer needs to be labelled. If the oligomer does not need to be labelled the data representing the oligomer is processed at step 102 by entering the oligomer sequence and molecular weight in the output table of Figure
5B and recording zeros in each of the six fields for the modifying groups to indicate that the oligomer is not to be labelled. If the oligomer does need to be labelled, at step
104 the molecular weight of one or mass labels is added to the molecular weight of the oligomer to give a combined molecular weight which is unique within the repertoire.
The oligomer sequence and combined molecular weight is entered in the output table and, at step 106, the one or more mass labels are assigned to one or more of the monomers in the oligomer by, for example, entering corresponding data in one or more of the six fields of the table of Figure 5B. The process is repeated for each oligomer in the repertoire.
Referring briefly to Figure 8, there is shown an input table containing mass label data. Referring now to Figure 7, this Figure shows how step 104 of Figure 6 is implemented. The molecular weight of the oligomer is taken from the oligomer data table and the molecular weight of one or more mass labels is taken from the mass label data table. The mass label molecular weight is added to the molecular weight of the oligomer at step 108. At step 110 it is determined whether the combined molecular weight is unique. If the combined mass is not unique, at step 112 the molecular weight of an additional mass labels is added to the combined molecular weight or alternatively the molecular weight of an alternative mass label, again read from the mass label data
table, is added to the molecular weight of the oligomer and, at step 110, it is determined whether the new combined molecular weight is unique within the repertoire. Steps 1 12 and 110 are repeated until a combined molecular weight which is unique is found. Once a unique combined molecular weight is found the data is processed at step 114 by outputting the oligomer sequence, the combined molecular weight and the modifying group information to the output table of Figure 5B.
Referring again to Figure 8, the table is constructed with entries for the masses for all of the different combinations of mass labels or modifying groups. The first entry is for the first modifying group, Modi, with a molecular weight of 16 Da and the second entry for the second modifying group Mod2. Entries are also present for combinations such as the combination of Modi and Mod3 with a molecular weight of 56 Da. Referring back to Figure 7, at step 108 the first mass in the table is added. If this does not give a unique molecular weight, at step 112 the first mass is subtracted and the second mass in the table is added. If this does not give a unique molecular weight, the next mass in the table is used, then the next and the next and so on until a unique molecular mass is arrived at.
Figure 9 shows an alternative table which has the molecular weights listed in increasing order and is used in a preferred embodiment. In other embodiments, the relative differences between consecutive weights of modifying groups in the table is used instead of absolute values, so that the step of subtracting the preceding mass does not have to be performed.
An embodiment of a method in accordance with the invention is shown in more detail in Figure 10. The oligomer data for a first oligomer is read from table 5 A. Since this is the first oligomer it can be taken as having a unique molecular weight and no mass labels need be assigned. The oligomer sequence and molecular weight, at step 116, are stored in the table 5B. The oligomer data for a second oligomer is read, at step 118, from table 5 A. The molecular weight of the second oligomer is compared, at step 120, with the molecular weight of each of the oligomers in table 5B, at this stage being only the first oligomer. If the molecular weight is not unique the molecular weight of a first
mass label is read from table 9 and, at step 122, added to the molecular weight of the oligomer. The combined mass is then compared, at step 124, with the molecular weight of each of the oligomers in table 5B. If the combined molecular weight is unique the oligomer sequence, the combined molecular weight and the modifying group information is stored, at step 128, in table 5B otherwise, at step 126, the molecular weight of another mass label, using a procedure as described with reference to Figure 7, is read from the mass label data and added to the molecular weight of the oligomer and steps 124 and 126 are repeated until a unique molecular weight is found. At step 132, if there are further oligomers to be processed steps 118 onwards are repeated. If a unique molecular weight is found at step 120, step 130, corresponding to step 128, is performed. The output table Figure 5B is thus filled with an entry for each oligomer to be represented in the repertoire. The entry comprises the oligomer sequence, a unique molecular weight and data representing modifying groups assigned to particular monomers of the oligomer. The data in this table, or an equivalent structure, is then used to construct a repertoire in accordance with the invention. The data is also used to correlate the results of a mass analysis of the repertoire with the sequences of the oligomers, since the data comprises each oligomer sequence associated with a unique molecular weight.
Figures 11 and 12 illustrate an alternative method of constructing table 5B. Referring to Figure 11 , the molecular weight of a first oligomer from table 5 A is read at step 134. At step 136, the molecular weight is compared with the molecular weights of each of the other oligomers in table 5 A. If the molecular weight is unique, no mass labelling is required and, at step 140, an indication is stored that the oligomer molecular weight is unique. If the molecular weight is not unique, an indication is stored, at step 138, that the oligomer molecular weight is not unique. The process is repeated for each oligomer in table 5A.
Data representing each oligomer in table 5A with a unique molecular weight is transferred to table 5B. An entry is made for the oligomer sequence and molecular weight and a zero entry is made in each of the six fields for the modifying groups to indicate that the oligomer is not to be labelled.
For each oligomer in table 5B with a non-unique molecular weight, the process shown in Figure 12 is performed. The molecular weight of a first mass label is read from table 9 and, at step 142, added to the molecular weight of the oligomer. The combined mass is then compared, at step 144, with the molecular weight of each of the oligomers in table 5B. If the combined molecular weight is unique, at step 152, the oligomer sequence, the combined molecular weight and the modifying group information is stored in table 5B. Otherwise, at step 146, the molecular weight of a next mass label is read from the mass label data and added to the molecular weight of the oligomer and, at step 148, the combined mass is compared with the molecular weight of each of the oligomers in table 5B. Steps 146 and 148 are repeated until a unique molecular weight is found and, at step 150, the oligomer sequence, the combined molecular weight and the modifying group information is stored in table 5B. As with the process described with reference to Figure 10, table 5B is thus filled with an entry for each oligomer to be represented in the repertoire. Again, the entry comprises the oligomer sequence, a unique molecular weight and data representing modifying groups assigned to particular monomers of the oligomer. The data in this table, or an equivalent structure, is then used to construct a repertoire in accordance with the invention. The data is also used to correlate the results of a mass analysis of the repertoire with the sequences of the oligomers, since the data comprises each oligomer sequence associated with a unique molecular weight.
It will be appreciated that the processes described with reference to Figures 6 to 12 are examples and that there are many equivalent or alternative embodiments within the scope of the invention. In particular the oligomer data of Figure 5 A does not have to be provided as discrete oligomer sequences, but can be derived from for instance a representation of a sequence or a genome.
It will be appreciated that embodiments of the invention described above are implementable using a software-controlled programmable processing device such as a
Digital Signal Processor, microprocessor, other processing devices, data processing apparatus or computer system, and it will be appreciated that a computer program for
configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present invention. The computer program may be embodied as source code and undergo compilation for implementation on a processing device, apparatus or system, or may be embodied as object code, for example. The skilled person would readily understand that the term computer in its most general sense encompasses programmable devices such as referred to above, and data processing apparatus and computer systems.
Suitably, the computer program is stored on a carrier medium in machine or device readable form, for example in solid-state memory or magnetic memory such as disc or tape and the processing device utilises the program or a part thereof to configure it for operation. The computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged as aspects of the present invention.
For completeness and referring now to Figure 13, there is shown a schematic and simplified representation of an illustrative implementation of a data processing apparatus in the form of a computer system. As shown in Figure 13, the computer system comprises various data processing resources such as a processor (CPU) 30 coupled to a bus structure 38. Also connected to the bus structure 38 are further data processing resources such as read only memory 32 and random access memory 34. A display adaptor 36 connects a display device 18 to the bus structure 38. One or more user-input device adapters 40 connect the user-input devices, including the keyboard 22 and mouse 24 to the bus structure 38. An adapter 41 for the connection of the printer 21 may also be provided. One or more media drive adapters 42 can be provided for connecting the media drives, for example the optical disk drive 14, the floppy disk drive 16 and hard disk drive 19, to the bus structure 38. One or more telecommunications adapters 44 can be provided thereby providing processing resource interface means for connecting the computer system to one or more networks or to other computer systems. The communications adapters 44 could include a local
area network adapter, a modem and/or ISDN terminal adapter, or serial or parallel port adapter etc, as required.
It will be appreciated that Figure 13 is a schematic representation of one possible implementation of a data processing apparatus in the form of a computer system. It will be appreciated, from the description of embodiments of the present invention, that the computer system in which the invention could be implemented, may take many forms. For example, rather than the computer system comprising a display device 18 and printer 21, it may be merely necessary for the computer system 10 to comprise a processing unit. The processing unit may be associated with or built into another piece of apparatus such as a nucleic acid probe synthesis machine (Figure 14) or a mass spectrometer to form a system (Figure 15).
Uses of the Invention
The primary application of the invention described herein is the development of a new method to enable the simultaneous measurement of the expression of a plurality of genes, or the detection a mutation in one or more genes or other nucleic acid sequence, including mRNA. The method is an alternative to and/or complementary to existing array technology. The method directly measures the presence and quantity of specific RNA or DNA molecules in a solution or piece of tissue (in situ) and in contrast to the present techniques does not rely on a PCR amplification step.
Existing microarray technology makes use of a collection of DNA molecules (oligonucleotides or cDNA), that are attached to a solid surface as an ordered array using machines specifically designed for that purpose. The array is subsequently hybridised with a mixture of probes that have been obtained by using an amplification reaction of RNA (or DNA for mutation analysis) isolated from the tissue or cell type under investigation and a control tissue or cell type. Usually two different fluorescent labels are incorporated in the experimental and control probes which are measured optically after hybridisation to the array. Using bioinformatics it is then calculated which RNA differences (or mutations or sequence per se) there are between two
samples or a collection of samples (for example, obtaining gene expression profiles of healthy vs. cancerous tissues, infected vs. normal organisms, a first developmental stage vs. a second, the identification of pathogens, and the like).
In case of the detection of mutations or the presence of particular sequences there is usually an ordered set of oligonucleotides on an array representing an entire sequence of a gene. This technology is becoming more commonplace in the art and will become standard technology. The weakest point is the amplification step (using PCR) which is required in order to obtain fluorescent signals. As a result repeated analyses are often required. Moreover, amplification introduces bias in the detection procedure by representing different sequences unevenly in the amplified sequence pool.
The invention offers an alternative method for the existing technology. It does not use an amplification (PCR) step, but instead a direct analysis is performed. The RNA or DNA in question is fixed to a solid support (for example, glass) in purified form or in its native state as a piece of tissue, cultured cells etc. It is subsequently hybridised with a repertoire of oligonucleotides, each member of which has a different molecular weight characteristic for its sequence as described above. The hybridised oligonucleotides are isolated and analysed by mass spectrometry to determine which molecular weight species are present in which quantity after hybridisation. Thus the identity and quantity of every oligonucleotide is determined, and information about the sequences present in the sample thereby obtained.
The invention thus provides a method for the direct sampling of one or more genetic features, simultaneously, in a target tissue or cell. The nucleic acids of the target may be analysed in purified or crude form, using oligonucleotide probes which can be accurately identified by mass analysis. Thus, the presence of mutations, polymorphisms, amplifications, deletions and other genetic conditions may be readily determined, in situ, without recourse to PCR or other amplification techniques.
Example A method according to the invention has been exemplified using globin RNA and complementary oligonucleotides. Even in the absence of optimisation steps, a result can be achieved. The oligonucleotides used were two oligonucleotides, having the sequence AAAGTGATGGGCCAGCACA for the beta globin gene and TTGAAAGCTCTGAATTCATGGGC for the gamma globin gene.
Oligonucleotides were labelled using standard procedures and used in a parallel experiment with unlabelled oligonucleotides. The autoradiogram shows the result of the oligonucleotides eluted from foetal liver cell RNA isolated from a transgenic mouse containing a human beta globin locus and expressing the gamma and beta genes, immobilised on a glass slide. SI analysis reveals that beta globin RNA is present in higher amounts than gamma globin RNA.
The peak diagram is a read out from the mass spectrometer separating the oligonucleotides by molecular weight. The results are shown in figures 16 and 17.
In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.
The scope of the present disclosure includes any novel feature or combination of features disclosed therein either explicitly or implicitly or any generalisation thereof irrespective of whether or not it relates to the claimed invention or mitigates any or all of the problems addressed by the present invention. The applicant hereby gives notice that new claims may be formulated to such features during the prosecution of this
application or of any such further application derived therefrom. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the claims.
All publications mentioned herein are incorporated by reference.