EP1492888A2

EP1492888A2 - Analysis of mixtures of nucleic acid fragments and of gene expression

Info

Publication number: EP1492888A2
Application number: EP03742963A
Authority: EP
Inventors: Achim Fischer
Original assignee: Axaron Bioscience AG
Current assignee: Sygnis Pharma AG
Priority date: 2002-02-27
Filing date: 2003-02-27
Publication date: 2005-01-05
Also published as: AU2003210377A1; US20060029937A1; AU2003210377A8; DE10208333A1; WO2003072819A2; WO2003072819A3; CA2480320A1

Abstract

The invention relates to a method for analysing nucleic acid fragments, said method comprising the following steps: a) at least one mixture of nucleic acid fragments is prepared, said mixture having at least one recognition site for a restriction endonuclease cutting outside its recognition site, b) at least part of the mixture of nucleic acid fragments from step (a) is incubated with at least one restriction endonuclease having a cutting site outside its recognition site, and c) at least one nucleotide of the cut nucleic acid fragments from step (b) is identified, and optionally other fragment-specific characteristics of the cut nucleic acid fragments from step (b) are identified, said identification steps being simultaneously carried out for a plurality, or all, of the nucleic acid fragments.

Description

Analysis of nucleic acid fragment mixtures

The invention relates to a method for analyzing nucleic acid fragment mixtures and the use of the method for gene expression analysis.

Methods for sequencing nucleic acid mixtures, such as can be obtained, for example, by “rewriting” (reverse transcribing) mRA molecules into cDNA molecules, are known from the prior art. When rewriting numerous different ones from a cell or cDNA molecules obtained from a tissue obtained from mR A molecules are cloned, mostly in plasmid or phage vectors, and then sequenced "clonally" (Sambrook, Maniatis, Fritsch. Molecular cloning: a laboratory manual, Cold Spring Harbor / NY 1989), whereby sequencing is usually "strand-building" according to the Sanger chain termination principle or "chain-dismantling" in Maxam and Gilbert sequencing. In any case, the separation of different molecules is carried out by means of a separation in the form of plasmids transformed into bacterial cells, followed by an increase in the separated molecules to identical copies, so that “pure” (ie, originating from identical molecules) signals are obtained during the sequencing Said procedure is suitable, for example, for so-called “EST sequencing” (EST = expressed sequence tag), in which numerous clones obtained in the manner described are partially sequenced (“sequenced”) and the sequence results obtained are listed. Depending on whether the sequenced Bank has been normalized or not, the relative frequency with which a particular cDNA or EST was sequenced reflects the abundance of the associated transcript, so EST sequencing can be used not only to detect expressed genes but also to compare Expression levels between different biological samples can be used (cf. such as Lee et al., Proc. Natl. Acad. Be. USA 92: 8303-8307 (1995)). However, the process of EST sequencing for, if necessary, comparative expression profiling is very complex, precisely because of this relationship between the relative transcript and relative clone abundance, since some transcripts (for example, so-called housekeeping genes) have a much higher frequency than other transcripts and thus more often clones Transcripts may need to be sequenced a few hundred to a few thousand times to be able to capture less abundant transcripts.

Several alternative methods have been described in the past in which not complete cDNA molecules are analyzed, but only fragments thereof. In particular, the methods of RAP (RNA arbitrarily primed PCR; Welsh et al., Nucleic Acids Res. 20: 4965-70) and the differential display (Liang and Pardee, Science 257: 967-971), which use PCR transcript fragments are amplified with short primers of a randomly selected sequence. These fragments, the length of which in turn can vary greatly from transcript to transcript, are separated and detected according to their size by means of gel electrophoresis. Here, at least theoretically, the abundance of a transcript is no longer represented by the frequency of an event, for example the frequency with which a clone representing this transcript occurs, but by the intensity of the respective band. This largely eliminates the redundancy that characterizes the EST sequencing of the prior art, which is associated with a cost reduction. To enable sequencing of individual fragments, the respective bands are isolated from the gel, reamplified by means of PCR and cloned. More modern variants of this method, as are described, for example, in EP 0 743 367, are based on fragment generation by means of restriction digestion of double-stranded cDNA, which significantly increases the reproducibility of the fragment patterns obtained. However, processes of this type still have the disadvantage that when bands are isolated from a gel, products which are often contaminated by other, undesirable DNA fragments are obtained. Furthermore, the isolation and cloning of individual bands is very labor intensive, so that identification of fragments without prior isolation would be very desirable. Sutcliffe et al. (Proc. Natl. Acad. Sci. USA 97: 1976-1981) describe a method called "TOGA" to convert mRNA molecules into cDNA restriction fragments, which are separated by means of capillary gel electrophoresis. For fragments of interest (which are compared in different preparations a signal (i.e. a collection of fragment-specific information such as fragment length, partial nucleotide sequence, information about position and / or orientation of the fragment within the starting cDNA etc.) is defined, here consisting of an 8, by differences in intensity of the bands in question expressing differentially expressed genes) bp long partial sequence, which is known for each fragment, as well as the information about the distance of this sequence from the 3 'end of the fragment. Using this signature, genes with the identify the same signature. If the signature generated is error-free, the associated genes can be assigned to cDNA fragments without the fragments having to be isolated and sequenced. However, the method described has disadvantages which lead to the unreliability of said signatures: (1) the identification of 4 nucleotides of the 8 bp long sequence, which is carried out by “invasive” or “selective” amplification primers, is imprecise, since primers are often also incorporated whose selective portion, namely the nucleotides located at the 3 'end, are not perfectly complementary to the template, and (2) the determination of the fragment length via the electrophoretic mobility is inaccurate, since the mobility of a fragment in addition to the length of G / C content and the exact sequence of the fragment (cf. Forensic Sei. Int. 94, 155-6 [1998]; for the concept of complementarity see the base pair rules known from the literature, for example in Ausubel et al., Current Protocols in Molecular Biology (1999), John Wiley & Sons). Therefore, the wrong length is often assumed. However, an incorrect length and / or an incorrect sequence means that a signature determined for a given fragment does not indicate the gene to be identified, but rather the corresponding database search either yields no result or an incorrect result. Similar restrictions apply to a comparable method called "gene calling", in which cDNA is subjected to double doses with different combinations of restriction endonucleases (Shimkets et al., Natur e Biotechnol. 17, 798-803 [1999]). The fragments obtained are separated by gel electrophoresis , its length and therefrom the distance between the two restriction endonuclease recognition sites on which the formation of a fragment is based, and signatures are generated, consisting of the sequence of the first recognition site, the sequence of the second recognition site and the assumed distance between the two recognition sites (expressed in base pairs) Using these signatures, database searches are carried out in order to allocate the detected fragments to the genes from which the fragments are derived, which also shows that due to great uncertainties in the determination of fragment sizes on the basis of fragment mobilities, there is a high response Part of incorrect assignments of database entries to detected fragments occurs.

It was therefore the object of the present invention to assign signatures present in a mixture to nucleic acid fragments which do not have the disadvantages of the prior art. The object according to the invention is achieved by a method for analyzing nucleic acid fragments, comprising the steps:

(a) Provision of a mixture of nucleic acid fragments which has at least one recognition site for one which cuts outside its recognition site

Have restriction endonuclease,

(b) incubation of at least a subset of the mixture of nucleic acid fragments from step (a) with at least one restriction endonuclease, the interface of which lies outside of its recognition site, (c) identification of one or more nucleotides of the cut nucleic acid fragments from (b), the identification being simultaneous for several or all nucleic acid fragments are carried out.

Furthermore, the object according to the invention is achieved by a method for analyzing nucleic acid fragments, comprising the steps:

(a) Provision of a mixture of nucleic acid fragments which have at least one recognition site for a restriction endonuclease cutting outside their recognition site, (b) incubation of at least a subset of the mixture of nucleic acid fragments from step (a) with at least one restriction endonuclease, the interface of which lies outside of its recognition site and which creates overhanging ends of known position and length but unknown sequence,

(c) Identification of one or more nucleotides of the overhanging ends of the cut nucleic acid fragments from (b), the identification being carried out simultaneously for several or all nucleic acid fragments.

The mixture of nucleic acid fragments is preferably, optionally amplified, restriction fragments of cDNA or genomic DNA. The fragments or a part of the fragments can be flanked by sequence regions common to all or some fragments. These common sequence regions can be, for example, linkers or adapters attached to the fragments, that is to say double-stranded nucleic acid fragments which can be obtained by Hybridization of two essentially or at least partially complementary oligonucleotides are readily available. Typically, adapters are characterized by a length between 5 and 200 nucleotides, preferably between 10 and 80 nucleotides, particularly preferably between 15 and 40 nucleotides. The fragments preferably have a characteristic size distribution with a smallest occurring size, a largest occurring size and an average size, the size being influenced or determined by the positions and / or the frequency of the recognition site or sites for the restriction endonuclease or restriction endonucleases used for fragment generation of course, the length of any attached linker or adapter must also be taken into account. In a preferred embodiment, a mixture of nucleic acid fragments, preferably double-stranded cDNA, is cut with at least one restriction endonuclease, which preferably has a four-based recognition sequence. Examples of suitable restriction endonucleases are Alul, Bfal, Bst I, Chal, Csp6l, Cv JI, Cv / JI, Dpnl, Dpnll, Haelll, Hhal, HwPlI, Hpall, HpyCm IV, HpyCIl4 V, Mbol, Msel, Mspls, Nlalll , Sau3al, Tail, Taql, Tsp5091. Often linker molecules are attached to one or both ends of the fragments thus obtained - generally via enzymatic ligation. This can be done without post-treatment of the fragments if the fragment ends and linker ends are compatible with one another, ie are smooth or have overhangs that are complementary to one another. However, it is also possible to post-treat the fragment ends in order to achieve complementarity. For example, single-stranded fragment ends can be removed using a nuclease or, in the case of 5 'overhangs, filled in using a polymerase and thus converted into smooth ends if the attachment of linkers with smooth ends is intended. Another example of post-treatment of fragment ends is partial filling, which can prevent, usually undesirable, ligation of two fragment ends to one another. For example, a palindromic and thus self-complementary overhang of the sequence 3'-CTAG-5 'generated by treatment with the restriction endonuclease Sau3al can be converted into a no longer self-complementary overhang of the sequence 5'-TAG-3' by treatment with a polymerase in the presence of dGTP , Only such linkers with an overhang 5 '-ATC-3' complementary to it could now be attached to such an overhang; ligation of two fragment ends with one another would no longer be possible. After the linkers have been attached, an amplification with one or more PCR primers directed against the attached linkers or with one or more against the fragments is optionally carried out to display a desired subset of fragments attached linker directed PCR primer and in addition a PCR primer which is directed against a terminal region of the original nucleic acid fragments, preferably the starting cDNA molecules. For this purpose, for example, the region which was introduced by the cDNA primer used for cDNA synthesis or a region which is artificially attached to the 5 'end of the mRNA used for cDNA synthesis or to the 3' end of the first strand cDNA was added. In the first case, "cDNA-internal" fragments are amplified, which thus had ends produced by restriction cut on both sides before the linker attachment, in the second case "terminal" fragments which had one end generated by restriction cut on the one side before the linker attachment and the other end thereof with the third 'End or the 5' end of the original nucleic acid fragments or the starting cDNA is identical. In this embodiment, the cDNA primer used is preferably an oligo-dT primer which can have an extension by one or more nucleotides at its 3 'end and / or at its 5' end, at least some of which have no " If two or more restriction endonucleases which generate different ends are used for the fragment generation, different linkers can be used in the subsequent step, some of which attach to one type of end and another part to another type of end If these linkers differ not only in their ends and thus in their compatibility (ie, their attachability) to the fragment ends, but also in their remaining sequence, then in a subsequent PCR amplification, specific fragments can be targeted by suitable choice of the primers (those on the linker sequences of the selected primers under the set amplification conditions can be amplified), while certain other fragments (those to the linker sequences of which the selected primers cannot bind) remain unamplified. A further possibility for the selective amplification of certain fragments is to use invasive primers which are extended at their 3 'end to the linker sequence common to all fragments by one or more additional selective bases (see for example EP 0 743 367). A further possibility for selective isolation or amplification, which can be used in the course of the method according to the invention, is described in WO 94/01582.

Restriction endonucleases that cut outside their recognition site are those restriction endonucleases in which the partial sequence that triggers the enzyme activity (the recognition site), which is usually one of 4-8

Base pair of existing region of double-stranded DNA and on which the Enzyme binds to the DNA double strand, and the interface, that is to say the region of the DNA double strand in which the sugar phosphate backbone of the DNA strands is hydrolytically separated, is offset from one another on at least one of the two strands forming the double strand. Examples of this are type IIs restriction endonucleases such as Fokl [cutting characteristic GGATG (9/13): the “upper” strand is cut 9 bases away from the recognition site GGATG, the “lower” strand 13 bases away from the recognition site] or Btsl [Cutting characteristic GCAGTG (2/0)] or the restriction endonuclease Bcgl [cutting characteristic (10/12) CGANNNNNNTGC (12/10): both strands are cut once in front of and once behind the recognition site]. Further examples are the restriction endonucleases Aarl, Acelll, Alol, Alwϊ, Bael, Bbr7l, Bbsl, Bbvl, BceAΪ, Beeil, BciYl, BfuAl, Bmrl, Bpli, Bpml, BpuEl, Bsal, BsaXl, BscAl, BseKIll, BseMIll , BsmBl, BsmFl, Bsp24l, BspCN I, BspMl, BsrOl, BstΕ5l, Cjel, CjeVl, Earl, Ecil, Eco57l, Eco57Ml, Fall, Faul, HaelV, Hgal, HinAl, Hphl, Mboll, Mmel, Mnll, Plel, Ppil PsrI, RleAl, Sapl, SfaNl, Sthl32l, Stsl, Taqll, TspOT I, TspGΨ I, 7YM 1 HI. To carry out the method according to the invention, preference is given to using restriction endonucleases which produce single-stranded overhangs, which can be both 3 'overhangs and 5' overhangs. If restriction endonucleases are to be used which produce smooth ends (for example Mlyl, cutting characteristic GAGTC (5/5), or SspO5 I, GGTGA (8/8)), the smooth ends can be converted into overhanging ends in an additional step. This can be done, for example, by incubation with T4 DNA polymerase in the presence of a selected nucleotide triphosphate; the exonuclease activity of the T4 DNA polymerase then degrades one of the two strands in the 3 '-' 5 'direction until the first "nucleotide of the same name" in the strand is reached (for example up to the first "G" if the nucleotide triphosphate dGTP used see Ausubel et al., Current Protocols in Molecular Biology (1999), John Wiley & Sons). Another type of restriction endonucleases that cut outside their recognition site are enzymes in which the recognition site is interrupted by a sequence of any or largely any nucleotides. Examples of this are enzymes such as Xcml (cutting characteristic CCANNNNN / NNNNTGG) or S / ZI (cutting characteristic GGCCNNNN / NGGCC). Another special case to be considered outside of their recognition site of restriction endonucleases which are cutting is so-called "nicking endonucleases" which only cut one strand of a nucleic acid double strand. Examples of such endonucleases are N.AlwI (GGATCNNN N) and N.BstNBI (GAGTCNNNN / N) the sense strand on the marked with "/" Cut position. If the use of such endonucleases is intended for carrying out the method according to the invention, care must be taken to ensure that the fragments in question, after the cut has been made, are converted into fragments which have a single-stranded overhang. This can be done, for example, by one of the following two measures: (1) "Melting" a short single strand bordering on the interface by alkaline or heat denaturation, the remaining fragment being intended to remain double-stranded, (2) Incubation with a further restriction endonuclease, which also the Counter strand to the strand cut (or still to be cut) by means of the "nicking endonuclease".

The recognition site for a restriction endonuclease that cuts outside of its recognition site in the nucleic acid fragments from the fragment mixture in (a) is preferably located within the terminal sequence regions common to many or all fragments of the mixture, in particular in the sequence regions of the adapters or linkers attached to the fragments. Here, the selection of the enzyme and the position of the recognition site are to be selected such that, when the restriction endonuclease or restriction endonucleases act, a "proximal" cut is made and the respective nucleic acid fragment is cut in the fragment-specific area which is outside the flanking linker common to all or many fragments In a particularly preferred embodiment, recognition sites for the restriction endonucleases to be used which are outside individual fragments and which lie outside the flanking linker areas common to all or many fragments are protected against recognition by the corresponding restriction endonuclease According to the prior art, recognition sites for certain restriction endonucleases can be achieved, for example, by incorporating methylated nucleotides such as methyl dCTP Protection against restriction endonucleolytic cleavage is also provided by using a methylase belonging to the selected restriction endonuclease. For example, the enzyme BamRl methylase converts recognition sites of the restriction endonuclease BamHl into their C-methylated form, which BamΑl no longer recognizes and cuts. The enzyme CpG methylase methylates CG dinucleotides, for example preventing a cut of a DNA fragment containing the sequence CGTCTC with the restriction endonuclease BsmBl (cutting characteristic CGTCTC (1/5)). In any case, the above measures ensure that each nucleic acid fragment present in the mixture in the course of a restriction induction only on exactly one predetermined position is cut. Furthermore, it would be possible to incubate the starting nucleic acid molecules (preferably cDNA or genomic DNA) used to generate the nucleic acid fragments from (a) in advance with the restriction endonuclease from step (b), then, as described above, with at least one further, usually frequently to treat cutting restriction endonuclease, to attach linker molecules to the ends generated with the latter and to carry out a PCR amplification with primers which are directed against the terminal linker molecules. This procedure ensures that the nucleic acid fragments in step (b) are cut only at the desired locations determined by the attached linker, since fragments with an "own", fragment-internal recognition site for said restriction endonuclease can no longer be amplified after the cut has been made and thus do not occur in the fragment mixture according to (a).

One or more nucleotides of the cut nucleic acid fragments can be identified in several different ways. In particular, three preferred procedures are suitable for this, which should not, however, preclude further procedures:

1. Extension of set-back 3 'ends with dideoxynucleotide triphosphates (“ddNTPs”) used for today's Sanger sequencing or also with acyclic nucleotides (ie with so-called “termination nucleotides” or “chain terminators”), each fillable strand being exactly one nucleotide is extended and the chain extension then terminates because there is no longer a free 3'-OH group. Since the incorporation takes place in a sequence-specific manner, the nucleotide opposite the incorporated nucleotide in the double strand can be clearly identified. Preferably, the terminating nucleotides carry marking groups which can be used to detect incorporation In a particularly preferred embodiment, the four dideoxynucleotides carry four distinguishable labeling groups, in particular four different fluorophores, and it can then be recognized from the fluorescence activity which of the four

Abrupt nucleotides was inserted and accordingly which nucleotide is present on the respective counter strand. The implementation of this first embodiment naturally presupposes that the nucleic acid fragments from (c) are set back and therefore have 3 'ends which can be filled in by means of a polymerase. This can easily be done by appropriate selection of the restriction endonuclease from (b) can be guaranteed. The following type IIs restriction endonucleases are particularly suitable: Aarl, Acelll, Alwl, Bbr7l, Bbsl, Bbvl, BceAl, Bce, BfuAl, Bsal, BscAl, BsmAl, BsmBl, BsmΕl, BspMl, Earl, Faul, Fokl, Hgä, Plel, S SfaNl, SM321, Stsl. Attachment of adapters with overhanging ends of suitable length and suitable type (3 'overhang or 5' overhang) to fragments having an overhanging end, the attachment being carried out in a sequence-specific manner. The overhanging fragment ends can in particular have been generated using one of the following restriction endonucleases: Aarl, Acelll, Alol, Alwl, Bael, Bbr7l, Bbsl, Bbvl, BceAl, Bcett, Bcgl, BciVl, BfuAl, Bmrl, BpR, Bpml, BpuEl,

Bsal, BsaXl, BscAl, BseMll, BseRl, Bsgl, BsmAl, BsmBl, BsmFl, Bsp24l, BspCN I, BspMl, BsrOl, BstF5l, Btsl, Cjel, CjeYl, Earl, Ecil, Eco57l, Eco57MlVll, Fall, F , Hgal, HinAl, Hphl, Mboll, Mmel, Mnll, Plel, Ppil, Psrl, RleAl, Sapl, SfaNl, SM321, Stsl, Taqll, Tsp Υ I, TspGΨ I, 7ΪA111II. Preferably, several adapters (so-called

Sequencing adapter) used, which differ in their overhang. A sequential or a parallel procedure is of course also conceivable, in which different adapters are used in separate fastening reactions. It is particularly preferred to use adapters bearing marking groups, which differ both in their overhang and in their marking group. In one embodiment, the labeling groups are fluorophores, so that the fluorescence activity of the fastening products shows which adapter has been fastened to a given fragment end. To determine the identity of the base of a fragment forming a single-base, overhanging end

Mixture can be, for example, adapters of the general structure

F Adapter X

are used, where adapter means the double-stranded portion of the adapter, X represents one of the four possible nucleotides in the form of a single-stranded overhang and F means a fluorophore which identifies the overhanging base X. The following assignment could be made:

Thus, for example, from a ROX signal obtained when the attachment products are separated by means of an automatic nucleic acid sequencing device, it can be deduced that the adapter having an overhanging G has been attached to a certain fragment and that the overhanging base of the fragment in question is accordingly one C had acted.

The procedure for identifying polybasic fragment overhangs is usually “nucleotide-wise”, for a two-base overhang as follows: Two adapters are used in separate approaches, which have the following general structure:

(1) F adapter NXj for identification of the first nucleotide or

(2) F-adapter-X ₂ N for identification of the second nucleotide,

where N is a mixture of all four possible nucleotides or a universal nucleotide such as inosine. In a first approach, the first nucleotide of the dibasic fragment overhang would be determined by attaching the first adapter, in a second approach, the second nucleotide of the dibasic fragment overhang would be determined by attaching the second adapter. Again, it is preferred that, as described above, there is a clear and known relationship between the nature of the fluorophore F and the specific nucleotide X, or X _{2 used for} sequencing. By identifying the first and second adapter, which (usually in two parallel approaches, whereby the nature of the first base of the overhang and in the second approach the nature of the second base of the overhang is determined) at the overhanging end, the sequence of the overhanging end can be determined.

In a double-stranded representation, the sequencing of a double-based 3 ′ overhang Y ₁ Y _{2 of} a fragment is carried out, for example, according to the following scheme:

Fragment Y ₂ Yi fragment

+ F-adapter-NXi F-adapter-X ₂ N adapter adapter

F-adapter-N X _\ fragment F-adapter-X ₂ N fragment

Adapter- Y ₂ Y _t fragment Adapter- Y ₂ Yi fragment

The following table shows the overhang Yi Y ₂ :

It goes without saying that overhangs of a length of more than two nucleotides, for example of three or four nucleotides, can also be sequenced. Furthermore, it is possible to use more than one base of the generated overhangs within a single experiment to use marking groups which allow the simultaneous detection of more than four (namely usually an integer multiple of four) different markings. In this case, the first four of the different markings could be used to identify a first base of fragment overhangs generated, the second four of the different markings for identification of a second base of the fragment overhangs generated, and possibly further sets of four different markings for further bases of the fragment overhangs generated , Such a “multiplexing” would result in a reduction in the number of experimental steps required. For example, so-called “quantum dots” could be considered as marker groups, of which numerous different ones can be detected in a common measurement without the measurement results influencing one another (Han et al., Nat. Biotechnol. 19, 631-5 [2001]). Extension of selective oligonucleotide primers whose nucleotide or nucleotides located at the 3 'end can hybridize with the nucleotide or nucleotides to be sequenced, followed by the identification of those primers which were extended in the extension reaction. If necessary, the extension can be carried out by means of the polymerase chain reaction (PCR).

Linkers or adapters are preferably attached to the ends of the nucleic acid fragments to be sequenced, which can serve as primer binding sites common to all or many fragments. The oligonucleotide primers are then designed in such a way that, after denaturing the nucleic acid fragments to be sequenced, they correspond to those at the 3 'end of the

Nucleic acid fragment strands can hybridize attached linker strand. It must be ensured here that the oligonucleotide primers hybridized in this way "overlap" by one or more nucleotides with the region of the nucleic acid fragment adjacent to the linker region, that is to say at their 3 'end they have nucleotides which correspond to the nucleotides of the

Can hybridize nucleic acid fragments if there is complementarity. These are "selective nucleotides" which allow the primer to be extended by means of a polymerase if they have become part of a double strand as a result of said hybridization, but which at least largely prevent the primer from being extended if they do not

Base pairing with the counter strand could form.

For example, in the following situation, in which the selective primer 5'-YYYYYYYYYN-3 'is hybridized to the linker region XXXXXXXX of the fragment of the sequence 5 -OOOOOOOOOOOOOOOOOOOOOOOMXXXXXXXX-3', an efficient extension of the hybridized primer can only be achieved if the selective base N of the primer is complementary to the last fragment-specific base M:

5 '- YYYYYYYYYN

3 -XXXXXXXXXMOOOOOOOOOOOOOOOOOOOOOOO-5 ' One or more nucleotides are identified simultaneously for several or all nucleic acid fragments, preferably after the nucleic acid fragments contained in the mixture have been separated according to a fragment-specific property, in particular according to the size and / or mobility of the fragments, by electrophoretic separation. The method of gel electrophoresis is particularly preferred, in which flat gels or gel-filled capillaries are used for the separation. In a preferred embodiment, enzymatic reactions according to variants 1-3 are carried out in step (c) in such a way that one or two nucleotides of the fragments are identified in parallel batches, the nucleotides of the fragments to be identified in the parallel batches being in a defined position relative to one another , for example adjacent to each other. One or two nucleotides of known position are then first determined in parallel separations of the said approaches for each of the separated fragments, which is preferably done by means of different marker groups which allow information about the nucleotides to be determined. In a further step, the determined nucleotides for individual or all of the separated fragments are then brought into the order in which they are present on the associated starting fragment from the mixture of nucleic acid fragments. The order of these two measures can of course also be interchanged. In any case, signatures in the form of short sequence sections which characterize the associated fragment are generated for the examined fragments. The length of these sequence sections is preferably at least 14 bases, more preferably at least 16 bases, in particular at least 20 bases. In addition to one or more sequence sections, a signature can also contain other information characterizing a fragment, for example exact or approximate distances (specified in base pairs) between characteristic areas of the fragment, for example the distance between two known sequence sections estimated with the aid of an internal length standard based on the electrophoretic mobility, between a known sequence section and a fragment end, or between both fragment ends. In this case, the length of the sequence sections is preferably at least 10 bases. In any case, the information content of a signature is preferably large enough to allow the unambiguous identification and / or isolation of the associated fragment. For example, experience has shown that approximately 14-20 base pairs of sequence information without additional information about distances within the fragment in question are usually sufficient to recognize a transcript containing this sequence segment from a mixture of cDNA molecules and to identify the associated gene. Take advantage of this fact for example methods of "tαg sequencing" such as SAGE (Velculescu et al., Science 270: 484-487 [1995], WO 00/53806) or MPSS (Brenner et al., Nature Biotechnol. 18: 630-634 [2000] It should be noted here that the length of a partial sequence for a unique identification of a transcript in the transcriptome must usually be longer than the theoretical minimum length, since the nucleotide sequence in genomes is not purely random and certain nucleotide sequences are preferred In practice, a signature consisting of a sequence of 8 nucleotides, which could theoretically encode 4 ⁸ = 65,536 different transcripts, identify numerous different human cDNAs, all of which are characterized by said signature, which is currently estimated at only around 30,000 - 40,000 genes in humans, so to ensure uniqueness, the information content of a signature must be sufficient he the theoretical minimum. The information content of a signature identifying a fragment can be increased, among other things, by the following information:

1. longer sequence,

2. Information about the actual or approximate length also of regions of the fragment whose sequence is unknown,

3. Pre-selection of possible identities.

When preselecting possible identities, the number or the probability of possible incorrect assignments is reduced by an additional statement about the fragment to be identified or the associated transcripts or genes in question. An additional statement about the fragment to be identified could be, for example, "3 'fragment double-stranded cDNA generated by the restriction endonuclease Rsal", whereby the identity of the sequence part of a signature with a 5' → 3 'direction is "above" (upstream) or "Before" the sequence region of a transcript located most towards the 3 'end of the fragment would be recognized as insignificant. Furthermore, signatures whose sequence portion was in the wrong orientation with regard to the 5' -3 'preferred direction of an mRNA Sequence or the cDNA sequence derived from it is not recognized as significant. Here, too, additional information is given as to which molecular biological procedure the signatures were generated, which precludes the occurrence of certain partial sequences as a signature or part of a signature about genes in question could, for example, be "from the totality of all genes expressed in the leaf" if transcripts from leaf samples are to be identified by means of plant signatures generated, but for example genes which are only expressed in the root should not be taken into account. In a preferred embodiment of the method according to the invention for the analysis of nucleic acid fragments, the simultaneous identification of one or more nucleotides of the cut nucleic acid fragments takes place in step c) via the following individual steps: ca) identification of a first nucleotide of the cut nucleic acid fragments from b), the identification being carried out simultaneously for several or all nucleic acid fragments, cb) optionally identifying a further nucleotide of the cut nucleic acid fragments from b), the identification being carried out simultaneously for several or all nucleic acid fragments, cc) optionally repeating step (c b) until the desired number of

Nucleotides has been identified, cd) summarizing the sequence information obtained in steps (ca) to (cc) for a selected group or all nucleic acid fragments into fragment-specific signatures, a signature being able to contain further information about the respective fragment in addition to the sequence information, wherein the nucleotide identification in steps ca) to cc) optionally also includes the separation of the nucleic acid fragments of the mixture.

In a further preferred embodiment, at least a subset of the mixture of nucleic acid fragments provided in step a) is subjected to the following process steps aa) to ad):

aa) separation of the mixture of nucleic acid fragments according to at least one fragment-specific property, ab) optionally detection of the relative frequency of some or all fragments in the separated mixture, ac) optionally comparing the information obtained in (aa) and / or (ab) about the composition of different mixtures of nucleic acid fragments according to step (a), ad) optionally registering nucleic acid detected in (ab) and / or (ac) Fragments which occur in different mixtures of nucleic acid fragments at different relative frequencies _^

while a fragment mixture selected from group I) to III) is treated according to steps b) and c), where I) a further subset of the mixture of nucleic acid fragments provided in step a),

II) a subset of the mixture of nucleic acid fragments provided in step a) which has previously been separated according to at least one fragment-specific property, III) is a mixture of nucleic acid fragments which is at least partially identical to I) or II).

It is further preferred that in one of the above processes according to the invention, at least one fragment of one of groups (I) to (III) is obtained in an additional process step.

The fragments of interest are preferably obtained by specific PCR amplification from a mixture of nucleic acid fragments, in which fragment-specific oligonucleotide primers are used, which are accessible and can be produced by the signatures determined in step (cd).

A further preferred embodiment relates to one of the above methods according to the invention, in which a mixture of nucleic acid fragments according to step a) or a subset of this mixture of nucleic acid fragments according to step a) is provided, which was produced by the following steps: i) flanking on both sides of the restriction fragments of the mixture with identical or different adapters ii) hybridizing the fragments from step (i) with different primers in each case, all of which have regions complementary to the adapters from step (i) and which have their respective 3 'ends have one or more nucleotides, which after

Hybridization of the primer with its target sequence protrude beyond the region complementary to the adapter and are complementary to the nucleotides opposite them in the double strand of a subset of the fragments from the nucleic acid mixture from (a). iii) sequence-specific extension of the primers from ii) and, if appropriate, subsequent PCR amplification of the nucleic acid fragments from the fragment mixture which had been extended in a sequence-specific manner in step ii).

Sequence-specific extension means that only those primers whose nucleotide or nucleotides at the 3 'end at step 3) are or are complementary to the opposite nucleotides of the fragment with which they hybridize a nucleic acid are hybridized - Have formed double strand.

In a particularly preferred embodiment of the method according to the invention, a method for gene expression analysis is provided, comprising the following steps:

al) providing at least one mixture of nucleic acid fragments, in particular at least one mixture of cDNA fragments, b1) separating the mixture of nucleic acid fragments according to at least one fragment-specific property, cl) optionally detecting the relative frequency of some or all of the fragments in the separated mixture , dl) optionally comparing the information obtained in (bl) and / or (cl) about the composition of different mixtures of nucleic acid fragments from (al), el) optionally registering nucleic acid fragments detected in (dl) which are in different mixtures of nucleic acid fragments occur in different relative frequencies, fl) incubation of a mixture of nucleic acid fragments, selected from Group I: a subset of the mixture of (al), Group II: the mixture of cDNA fragments separated in (bl) or a part thereof, Group III: one with the mixture of (al) or the separated mixture (bl) at least partially identical mixture of nucleic acid fragments, but additionally comprising at least one recognition site for a restriction endonuclease that cuts outside its recognition site, with the restriction endonuclease or restriction endonucleases that cut outside its recognition site, gl) identification of a first nucleotide of the cut nucleic acid fragment from where the identification is carried out simultaneously for several or all nucleic acid fragments, hl) if appropriate, identification of a further nucleotide of the cut nucleic acid fragments from (fl), the identification being carried out simultaneously for several or all nucleic acid fragments, il) optionally repeating step (hl) until the desired number of nucleotides has been identified, jl) optionally repeating steps (fl) to (il) one or more times, the position and / or sequence of the recognition site being changed in such a way that the repetition or repetitions in each case the identification beforehand Allow not yet identified nucleotides, kl) Summary of the sequence information obtained in steps (gl) to (jl) for a selected group or all nucleic acid fragments into fragment-specific signatures, wherein a signature can contain further information about the respective fragment in addition to the sequence information .

11) optionally obtaining fragments of interest from the mixture of nucleic acid fragments from (al) or (b1), the fragments of interest being the fragments registered in (e), ml) optionally identifying the genes belonging to the nucleic acid fragments of interest, from which the nucleic acid fragments are derived by means of

Searching electronic databases, where the fragments of interest can be the fragments registered in (e). When steps (fl) to (i) are repeated, changing the position and / or sequence of the recognition site ensures that nucleotide positions of the fragments to be analyzed other than those previously examined are converted into single-stranded overhangs and thus further, previously unidentified Nucleotides can be identified. In addition to a sequential approach, a simultaneous approach in parallel approaches is of course also possible. In a preferred procedure, the procedure is as follows: At least one fragment mixture is provided in which many or all fragments have identical ends, for example smooth ends or overhanging ends of the same length and sequence. This mixture is aliquoted, for example in 10 substantially equal aliquots. Each of the mixtures is mixed with one of a selection of different adapters (here one of 10 different adapters) and exposed to ligation conditions, with all adapters being distinguished by an end that is compatible with the fragment ends, that is to say attachable to them. Furthermore, all adapters have at least one recognition site for a restriction endonuclease that cuts outside their recognition sequence, for example Mmel. The adapters differ from one another in that the recognition sequence is removed from the adapter end to be attached to the fragment ends. In a particularly preferred embodiment, two different adapters differ in this distance by an integer multiple of the length of the overhanging ends, which can be generated by said restriction endonuclease which cuts outside their recognition sequence. In the example of the restriction endonuclease Mmel (cutting characteristic TCCRAC (20/18)), the distance is accordingly 18 bp in some adapters, 16 bp in other adapters, 14 bp, 12 bp, 10 bp, 8 bp, 6 bp, 4 in the other adapters bp, 2 bp or 0 bp. If all 10 adapter attachment products are now subjected to an incubation with the restriction endonuclease, here Mmel, bases 19 and 20 are in the case of the first reaction, bases 17 and 18 in the second reaction, and bases 15 and 16, 13 in the remaining reactions and 14, 11 and 12, 9 and 10, 7 and 8, 5 and 6, 3 and 4 or 1 and 2 in the form of a single-stranded overhang. The complete set of all 10 reactions thus allows the identification of a coherent, 20 base long partial sequence or signature for the fragments present in the fragment mixture. In addition to changing the position of an interface, it would of course also be conceivable to provide recognition sites for restriction endonucleases cutting at different distances from their recognition sites at the same position. For example, an adapter could have one at its end to be attached to the fragment ends Detection point for Earl (cutting characteristic CTCTTC (l / 4)), a second adapter in the same position a detection point for SfaNl (cutting characteristic GCATC (5/9)) and a third adapter in the same position a detection point for Stsl (cutting characteristic GGATG (10/14 )), which made it possible to use the method according to the invention to identify 13 base partial sequences of the fragments. Of course, a combination of both approaches (changing position and sequence) is also conceivable.

In a further particularly preferred embodiment of the method according to the invention, a method for gene expression analysis is provided, comprising the following steps:

a2) provision of at least one mixture of nucleic acid fragments, in particular a mixture of cDNA fragments, having at least one recognition site for a restriction endonuclease that cuts outside of its recognition site and that is located on linkers attached to starting fragments, b2) incubation of the mixture of nucleic acid fragments from ( a2) with the restriction endonuclease or the restriction endonucleases from step (a2), c2) identification of a first nucleotide of the cut nucleic acid fragments from (b2), the identification being cut simultaneously for several or all nucleic acid fragments of the mixture and with separation of the mixture and for

Identification of the nucleotide of suitably treated nucleic acid fragments takes place according to at least one fragment-specific property, d2) optionally identification of a further nucleotide of the cut nucleic acid fragments from (b2) according to (c2), e2) optionally repetition of step (d2) until the desired number of Nucleotides has been identified, f2) optionally repeating steps (a2) to (e2) one or more times, the position and / or sequence of the recognition site being changed in such a way that the repetition or repetitions in each case the identification of previously unidentified nucleotides allow, g2) combining the sequence information obtained in steps (c2) to (f2) for a selected group or all nucleic acid fragments into fragment-specific ones Signatures, where a signature can contain further information about the respective fragment in addition to the sequence information, h2) assignment of the fragment-specific information obtained in the separation according to a fragment-specific property in (c2) to the signatures obtained in (g2) for the nucleic acid fragments, being the fragment specific

In the case of an electrophoretic separation of the fragments, information includes the relative or absolute mobility of the fragments and / or the apparent or actual fragment length determined on the basis of a length standard and the assignment can take place in tabular and / or computer-readable form, i2) if necessary, identification of the nucleic acid Fragments belonging to genes from which the nucleic acid fragments derive, by searching electronic databases for the signatures from (g2), j2) optionally providing at least one further mixture of nucleic acid fragments, in particular a mixture of cDNA fragments, obtained from analogue ones Way to the mixture of nucleic acid fragments from (a2), it being possible here to dispense with the addition of linkers having at least one recognition site for a restriction endonuclease that cuts outside its recognition site, k2) separation of the mixture or mixture the nucleic acid fragments from (i2) according to a fragment-specific property essentially under the conditions of the separation into (c2),

12) Assignment of the fragment-specific information obtained in the separation according to a fragment-specific property in (k2) to the individual separated fragments, the fragment-specific information being relative or absolute frequency of the individual fragments and in the case of an electrophoretic

Separation of the fragments may include the relative or absolute mobility of the fragments and or the apparent or actual fragment length determined on the basis of a length standard and the assignment may be in tabular and / or computer-readable form, m2) if necessary, comparison of the relative or absolute frequencies of at least part of the fragments separated in (k2) with the relative or absolute frequencies of the respective homologous, that is to say completely or essentially sequence-identical fragments originating from different mixtures of nucleic acid fragments, n2) where appropriate, registration of those fragments whose relative or absolute frequency differ from the relative or absolute frequency of their homologous fragments of other mixtures of nucleic acid fragments by at least a preselected factor, o2) where appropriate, assigning the fragments registered in (n2) to those genes or Transcripts, from which said fragments are derived, using the results obtained in step (h2), p2) optionally obtaining the fragments registered in (n2) from the mixture of nucleic acid fragments from (a2) or (i2) and / or (j2),

wherein steps (i2) to (n2) can also be carried out before steps (a2) to (h2).

Mixtures of nucleic acid fragments, preferably mixtures of cDNA fragments, can be produced by methods known from the prior art. For example, EP 0 743 describes

367, to which reference is hereby made in full, the generation of fragments obtained by means of mostly frequently cutting restriction endonucleases, which represent the 3 'ends of cDNA molecules and which are flanked on one side by linkers and which are selected by means of selective (at their 3' end via the "universal" binding site common to all primers of a type are amplified by one or more "selective" nucleotides extended PCR primers in the form of several subgroups ("subpools"). Each of these subgroups then consists of a subset of the total of all cDNA initially generated -3'- Fragments: Corresponding (ie generated with the same selective primers) to be examined from different genes for differentially expressed genes

Fragment subpools obtained from RNA preparations are then separated by gel electrophoresis according to their size, and the bands obtained. Signal patterns are compared with each other. Bands or signals originating from homologous fragments, the intensity of which differs between different samples, represent genes that are in the compared ones

Samples are expressed to different extents (see, for example, Fig. 1 of EP 0 743 367). Further alternative methods for producing mixtures of cDNA fragments for expression analysis are known from the prior art, cf. for example Kato: Nucleic Acids Res. 23, 3685-3690 (1995), Ivanova et al., Nucleic Acids Res. 23, 2954-2958 (1995), Bachern et al, Plant J. 9, 1996, 745-753, Prashar et al., Proc. Natl. Acad. Be. USA 93: 659-663 (1996), Shimkets et al., Nat. Biotechnol. 17: 798-803 (1999), Ke et al., Analyt. Biochem. 269: 201-204 (1999). Jing et al., Analyt. Biochem. 287: 334-337 (2000) Sutcliffe et al. Proc. Natl. Acad. Be. USA 97, 1976-1981 (2000), WO 99/42610, EP 0 981 609.

The fragment-specific property is a property, in particular a physical or physicochemical property, which can be realized by different molecules within a continuum or in the form of a larger number (for example at least 10 or at least 100) of different gradations or forms. It is particularly preferred to use different mobility of different nucleic acid fragments in separation systems, in particular different electrophoretic mobility in electrophoresis systems such as agarose or polyacrylamide gel electrophoresis. Mobility is usually influenced by the length of a fragment; however, it is not a strictly linear relationship, since the G / C content and conformation of a nucleic acid molecule also influence mobility. Therefore, the mobility of a nucleic acid molecule can usually only be used for approximate, but not for absolute size determination. Furthermore, the fragment-specific property can be a specific partial sequence of n nucleotides, where n can be equal to or greater than 1. Said partial sequence of a fragment preferably adjoins a linker attached to the end of the fragment, so that a mixture of different fragments according to this partial sequence can be separated by extension, optionally a repeated extension in the form of an amplification, of selective oligonucleotide primers. Such a procedure is described for example in EP 0 743 367. In this case, “separation of a fragment mixture” means the production of mixtures of amplified fragments, each of which copies produced by amplification contains only a part of the fragments present in the starting mixture. In another preferred case, said partial sequence is at least partly in the form of a single-stranded one Overhang, and a mixture of different fragments after this partial sequence is separated by attaching adapters with compatible overhangs. This process, also called “categorization of nucleotide sequence populations”, is described in WO 94/01582. A combination of both measures is also conceivable and is described, for example, in WO 01/75180. The relative frequency of some or all of the fragments is detected by measuring the signal strength obtained in the detection of individual nucleic acid fragments. In a preferred embodiment, the nucleic acid fragments contain detectable labeling groups, the use of fluorophores as labeling groups being particularly preferred. If, for example, an automatic sequencing device is used for the separation and detection, the relative frequency of a fragment as an area under the corresponding curve in the fluorogram (the plot of the measured fluorescence intensity as a function of the retention time) is readily available in the form of a numerical value. A fragment here is to be understood as the entirety of all the sequence-identical nucleic acid molecules of a mixture, possibly plus the sequence of complementary nucleic acid molecules. The numerical values obtained as the relative frequency of fragments are often stored in a computer-readable form.

In the step of registering nucleic acid fragments, preferably cDNA fragments, of different relative frequencies, those fragments are identified which differ in their proportion between different biological samples or between different mixtures of cDNA fragments. If care is taken to generate cDNA fragments from the mRNA molecules present in the samples, the frequency distribution of which is similar or even equal to the frequency distribution of the different mRNA molecules, then cDNA fragments between mutually compared fragment mixtures of different frequencies also show mRNA -Molecules of different frequencies and thus differentially expressed genes. In order to compensate for minor fluctuations, for example in the efficiency of the enzymatic steps carried out beforehand or in the detection, a threshold value for frequency differences can optionally be set, so that, for example, only those cDNA fragments whose relative frequency between mutually compared fragment mixtures is at least the same are further investigated Factor two differs.

The simultaneous identification of a nucleotide or a plurality of nucleotides for several or all nucleic acid fragments is preferably carried out by, as described above, on the overhanging fragment ends generated by means of at least one restriction endonuclease cutting outside their recognition site for identity the characteristic process of the nucleotide to be identified is carried out, for which a mixture of several or all nucleic acid fragments is used and the result of which can preferably be observed by incorporating a label, in particular a fluorescent label. It is preferred here that the identified nucleotides lie in proximity to one another, that is to say that the information about the nucleotide identities thus obtained results in a coherent partial sequence of the respective nucleic acid fragment. In a preferred embodiment, after the process, the “sequencing reaction”, has expired, the products formed in the process are separated, in which case the separation can again take place according to the fragment-specific property from (b1) or (c2).

When the sequence information obtained is combined into fragment-specific signatures, the nucleotide identity obtained for some positions is assigned to each or some of the separated nucleic acid molecules. The information received about a fragment is called a signature. In addition to sequence information, the signature can also contain further information, for example sequence information obtained in another way or approximate fragment size obtained via fragment mobility. For example, according to the above-mentioned EP 0 743 367 cDNA-3 'fragments are generated using the restriction endonuclease Rsal (recognition sequence GTAC) and a selected fragment, as seen from the recognition site for Rsal, the nucleotides identified in steps (gl) to (jl) assigned the identity A (1st nucleotide), G (2nd nucleotide), T (3rd nucleotide) and A (4th nucleotide), a sequence signature of the nucleotide sequence GTACAGTA can be generated from this. In addition to the approximate fragment size, additional marginal information could be added that (assuming that the Rsαl digestion is complete) there can of course be no further identical partial sequence between the partial sequence GTAC and the 3 'fragment end. In any case, fragment-specific signatures can be determined for all or part of the fragments contained in a fragment mixture. When using the method according to the invention for comparative gene expression analysis, signatures are determined in particular for those fragments which differ in their relative frequency between the fragment mixtures to be compared by at least one fixed factor. Otherwise, the sequence portion of a signature does not necessarily have to be a coherent sequence. For example, it is conceivable that partial nucleotide sequences of both fragment ends of a given fragment are determined and combined to form a signature; it is of course also possible to include additional information in the signature, such as approximate fragment lengths. For example, the signature could be for a particular fragment

5'-CTCA {192} GGAT-3 '

mean that the fragment "begins" with the nucleotide sequence CTCA at the 5 'end, "ends" with the nucleotide sequence GGAT at the 3' end and in total, possibly plus terminal linker regions, approximately 200 bp (= 4 bp + 192 bp + 4 bp) is long. Here, the formulation "approximately" takes into account that the length determination of fragments based on the electrophoretic mobility, as stated above, is subject to a certain error.

Fragments of interest can be obtained from the mixture of nucleic acid fragments, preferably cDNA fragments, with the aid of the fragment-specific signatures determined, for example by means of PCR using gene-specific primers. For example, in the example above, a mixture of 3'-cDNA fragments was obtained by means of the restriction endonuclease Rsal, followed by the ligation of linker to the (smooth) fragment ends, and if the above signature GTACAGTA was obtained for a selected fragment Fragment known that after the Rs 1 cut (removal of the first two nucleotides of the Rsαl recognition site, GT), the first nucleotides following the linker sequence are the sequence ACAGTA. If a primer is now used for PCR amplification which, following the linker sequence at its 3 'end, has precisely this nucleotide sequence ACAGTA, the associated fragment can be accessed directly by amplification from the fragment mixture, since said primer selectively promotes amplification of those fragments. with which it is sequence-identical (or complementary) over its entire length. The fragment obtained in this way can then be subjected to a further analysis, for example sequencing, followed by a database query for entries which are identical or similar to sequences. A prerequisite for this procedure is of course a sufficiently high information content of the signature, ie a sufficient length and thus specificity of the fragment-specific area of the amplification primer. If the partial sequence ACAGTA bordered directly on the linker region in more than one of the fragments in the mixture, a mixture of these fragments could be amplified with the aid of said primer. In order to obtain a single fragment of interest in the manner described, the primer used would therefore have to be extended at its 3 'end by further specific bases. It must also be taken into account here that the greater the distance from the 3 'end of the primer, the less the ability of polymerases to discriminate against the extension of primers hybridized with partial mismatch with the template strand are. If a primer is extended at its 3 'end to increase the specificity by further fragment-specific bases, a certain loss in the specificity of those bases which immediately follow the sequence section of the primer complementary to the respective linker sequence is to be expected.

In a preferred application of the method according to the invention, the signatures obtained for nucleic acid fragments of interest are used to design fragment-specific oligonucleotide primers. In this application, it is further preferred to use the oligonucleotide primers obtained for the amplification of selected fragments, the mixture of nucleic acid fragments or a subset thereof being generally used as the amplification template.

The genes belonging to the nucleic acid or cDNA fragments of interest can be identified by searching electronic databases if the information content of a signature is high enough to permit clear or largely unambiguous identification of a gene and if the database has corresponding entries. How high the information content of signatures of a biological species has to be in order to allow a clear assignment of a signature to the associated gene can be determined empirically and can even differ from gene to gene within a biological species; for example, a particular decamer (a 10-nucleotide signature) may be characteristic of a single gene, while another decamer appears in many different genes. In a preferred application of the method according to the invention, the signatures obtained for nucleic acid fragments of interest are used to identify the nucleic acid fragments in a database search.

In a further preferred application of the method according to the invention, the signatures obtained for nucleic acid fragments of interest are used to create EST banks. For this purpose, the signatures obtained for the individual fragments obtained from a cDNA preparation are used in order to design fragment-specific oligonucleotide primers. These are then used to obtain the respective fragments by means of PCR amplification. The fragments obtained are finally sequenced and the sequences are recorded in a database. EST banks generated in this way can also be referred to as normalized EST banks, since each fragment is generated only once, regardless of its abundance or the abundance of the mRNA or cDNA molecules it represents. This is of great advantage in comparison with EST banks produced according to the prior art, which show an extremely high level of redundancy (cf. Lee et al., Proc. Natl. Acad. Sei. USA 92, 8303-8307 [1995 ]). This redundancy in the traditional way of EST banks follows from the fact that abundant transcripts (e.g. an abundance of 1000 mRNA copies per cell) are represented by significantly more cDNA clones contributing to the EST bank than less abundant transcripts (e.g. an abundance of 1 mRNA copy per cell - the frequency difference of clones representing these two transcripts would be 1: 1000 in this example). Methods for the normalization of cDNA libraries are also known from the prior art, in which the concentration kinetics of abundant and less abundant clones are carried out using the reassociation kinetics of nucleic acids (Soares et al, Proc. Natl. Acad. Sei. USA 91, 9228 -9232 [1994]). Such normalized banks are characterized by a depletion of particularly abundant clones, but the difference in abundance between more and less frequent clones is still considerable and can be between one and two orders of magnitude, which makes the production and analysis of banks of this type very expensive. In the production of normalized banks according to the method according to the invention, redundancy can be practically excluded; however, in contrast to normalized banks according to the prior art, the redundancy information of the individual fragments, clones and those transcripts from which they are derived is not lost. Rather, abundance information can be obtained from the respective separation, for example by means of capillary gel electrophoretic separation The signal strength of the individual fragments of an examined fragment mixture can be taken and added to each received EST sequence as additional information.

In a further preferred application of the method according to the invention, mixtures of genomic DNA or cDNA generated restriction fragments flanked on both sides by identical or different adapters are used as a mixture of nucleic acid fragments, the fragments flanked by the adapter first being amplified by means of their 3 'end via the complementary one to the adapter Be subjected to a range extended by one or more nucleotides of primers and the amplification products thus obtained are used to carry out the method.

In a further embodiment of the method according to the invention, fragments are used as a mixture of nucleic acid fragments which were generated by restriction digestion with at least some of the type IIs restriction endonucleases from genomic or cDNA and which are flanked on one or both sides by adapter sequences. In this application, overhanging ends are produced from the type IIs restriction endonucleases used, the sequence of which is not determined directly by the restriction endonuclease, but by the nucleic acid sequence of the interface and which can consequently differ from fragment to fragment. If desired, adapters can be used for attachment which can only be attached to certain overhanging ends, in particular those whose nucleotide sequence is complementary to the nucleotide sequence of the overhanging adapter ends. In this way it is possible to attach certain preselected adapters to only a part of all nucleic acid fragments and thus to generate a subset of the mixture of nucleic acid fragments used (so-called "molecular indexing", see Kato, Nucleic Acids Res. 1996, Jan. 15 , 24 (2): 394-395, and WO 94/01582).

In a particularly preferred embodiment, the required enzymatic reaction batches are created using an automatic pipetting device.

In a further particularly preferred embodiment, the fluorograms obtained by means of gel electrophoresis, preferably by means of capillary gel electrophoresis, are automatically evaluated. In this evaluation, a Signals of different fluorograms belonging to one another are assigned to the computer system, which (i) homologous fragments from different mixtures of nucleic acid fragments, (ii) fragments of a nucleic acid mixture and the reaction products which were obtained for the identification of one or more nucleotides of the fragments of this mixture, (iii) Reaction products, which were obtained for the identification of several nucleotides of the fragments of a mixture of nucleic acid fragments, represent. Such an automatic assignment can take place, for example, according to the following instruction:

1. Choose a suitable start signal that is not yet included in any assignment,

2. Search for the most suitable signal for this, the criteria being (a) the smallest possible difference in the determined fragment length and (b) the smallest possible difference in the signal intensity and where these two criteria can be freely weighted, 3. Repeat step (2), taking each additional signal with the mean

Fragment length and signal intensity of all previously assigned signals is compared, 4. Cancel the process if the differences from (2) a preselected one

Exceed threshold. 5. Repeat steps (1) to (3) until all signals of a set of fluorograms to be assigned to one another have been assigned to one another or have been identified as being not assignable to one another.

It is further preferred that the automatic evaluation carry out the steps (dl), (el), (gl), (hl), (il), oil), (kl), (ml), (c2), (d2) , (e2), (f2), (g2), (h2), (i2), (12), (m2), (n2) and / or (o2).

The invention is explained in more detail below by the drawings. Show it

1: the generation of adapter-flanked nucleic acid fragments,

2: the sequencing of overhanging fragment ends by means of adapter ligation, 3: the generation of various overhanging ends by shortening a nucleic acid fragment,

Fig. 4: the identification of a nucleotide for all fragments of a mixture of nucleic acid fragments, Fig. 5: the identification of four nucleotides for all fragments of a mixture of nucleic acid fragments.

6: the separation of a mixture of nucleic acid fragments by means of capillary gel electrophoresis,

7: the identification of several nucleotides of a nucleic acid fragment by means of capillary gel electrophoresis,

8: a list of some signatures obtained from a suspension culture of Saccharomyces cerevisiae.

9: shows the identification of several nucleotides of four nucleic acid fragments of a mixture of nucleic acid fragments.

1 shows the generation of adapter-flanked nucleic acid fragments, 1 the fragmentation of a nucleic acid preparation by means of two

Restriction endonucleases, and

2 illustrates the attachment of adapters to the fragment ends.

2 shows the sequencing of overhanging fragment ends by means of adapter ligation, wherein

1 sequencing the first position of the overhanging ends, and

Figure 2 shows sequencing of the second position of the overhanging ends. Sequencing of a nucleic acid fragment representing the 3 'end of a cDNA molecule is shown. The one for sequencing The adapters used are distinguished by a different sequence of the overhanging ends and by different marker groups which code for the sequence of the respective overhanging end. A marker group indicating the base A is indicated by a dotted adapter, a marker indicating a C by a hatched adapter, a marker indicating a G by a filled adapter and a marker indicating a T by a cross-hatched adapter. A marker group attached to the fragment in (1) by ligation and indicating a T indicates that the first base of the overhang is the complementary base A. A labeling group attached to the fragment in (2) by ligation and indicating a C indicates that the second base of the overhang is the base G complementary thereto.

FIG. 3 shows the generation of various overhanging ends by shortening a nucleic acid fragment, wherein

1 the attachment of three different adapters, each with a recognition site (hatched area) at a different position, for a type IIS restriction endonuclease,

2 the incubation of the fastening products with said type IIS restriction endonuclease, and

3 shows the release of shortened overhanging fragment ends which, with respect to the double-stranded region of the starting fragment, positions -5 and -6 (left), -3 and -4 (middle) and -1 and -2 (right) in terminal single-stranded and thus contain sequencing accessible via adapter ligation. A 3 'cDNA fragment obtained by means of the restriction endonuclease Mbol is shown here as the starting fragment.

4 describes the identification of a base for all fragments of a mixture of nucleic acid fragments. The fragments are provided with fluorescent labeling groups and separated according to their mobility by means of capillary gel electrophoresis. The resulting fluorogram (shown above) is used to catalog the fragments (assign consecutive numbers). Then the nucleotides located there for the position of the fragments to be determined, as described above identified. After carrying out the corresponding reactions, in which the identity of said nucleotides is coded by introducing nucleotide-specific labeling groups, the products are also separated by means of capillary gel electrophoresis, and the identity of the labeling groups introduced is determined taking mobility and signal intensity into account, if appropriate. The identification of the base of interest results in "G" for fragment 3, "A" for fragment 2, "T" for fragments 1 and 6, and "C" for fragments 4, 5 and 7.

Figure 5 shows the identification of four nucleotides for all fragments of a mixture of nucleic acid fragments (fragments 1-7). In the case of a direct succession of the four nucleotides, the following sequence signatures result:

Fragment 1: TGT AFragment 2: ATGA

Fragment 3: GATG

Fragment 4: CCGT Fragment 5: CACC

Fragment 6: TGAT

Fragment 7: CTCC

6 shows the separation of a mixture of nucleic acid fragments by means of capillary gel electrophoresis. As described, cDNA fragments were generated from a suspension culture of Saccharomyces cerevisiae. The signals obtained from a stationary phase (gray) and from a culture in the logarithmic phase (black) are shown. Some of the fragments represent constitutively expressed genes (signals marked with "C" = constitutively expressed), others genes downregulated in the stationary phase (signals marked with "D" = downregulated) and still others genes upregulated in the stationary phase (with "U" marked signal = upregulated) The horizontal scale shows the fragment size, the vertical scale shows the fluorescence intensity.

7 shows the identification of several nucleotides of a nucleic acid fragment by means of capillary gel electrophoresis. E, one of the fragments from a mixture of nucleic acid fragments, B1-B16, identification of the first to sixteenth base of the fragment, FAM, PET, VIC, NED, the respective fluorophore detected during the identification of a base, (G), (A ), (T), (C), the base identified by means of the respective fluorophore. For the selected fragment results in the signature GATCTCACAAATGGTT. The bar at the top shows the fragment size, so it is a fragment with a size of approximately 140 bp.

8 shows a list of some signatures obtained from a suspension culture of Saccharomyces cerevisiae. The fragment size, the signatures determined according to the method according to the invention, the open reading frames (ORFs) identified by means of BLAST analysis and the signal intensity obtained by means of capillary gel electrophoresis are to be specified in each case.

9 shows the identification of several nucleotides of four nucleic acid fragments of a mixture of nucleic acid fragments. The fragments are approximately 75 bp, 77 bp, 78 bp and 79 bp in length. F, separated fragments of the mixture, B1-B6, identification of the first to sixth bases of the fragments, FAM, PET, VIC, NED, the respective fluorophore detected when identifying a base, (G), (A), T), ( C), the base identified by means of the respective fluorophore. The signature TCATTG results for the fragment with a length of 75 bp, the signature ACTGGC for the fragment with a length of 77 bp, with the signature ATGCCT for the fragment with a length of 78 bp, and for the fragment with a length of 79 bp Signature TATGCT.

The invention is further illustrated by the following examples.

Example 1: Collection of cDNA 3 'restriction fragments

25 μg of total RNA from a suspension culture of Saccharomyces cerevisiae was precipitated with ethanol and dissolved in 15.5 μl of water. 0.5 μl of 10 μM cDNA primer CP31V (5 -ACCTACGTGCAGATTTTTTTTTTTTTTTTTTV-3 ', SEQ ID NO: 1) were added, denatured for 5 minutes at 65 ° C. and placed on ice. The mixture was mixed with 3 ul 100 mM dithiothreitol (Life Technologies GmbH, Karlsruhe), 6 ul 5x Superscript buffer (Life Technologies GmbH, Karlsruhe), 1.5 ul 10 mM dNTPs, 0.6 ul RNase inhibitor (40 U / ul ; Röche Molecular Biochemicals) and 1 μl Superscript II (200 U / μl, Life Technologies) and incubated at 42 ° C. for 1 hour for cDNA first strand synthesis. For the second strand synthesis, 48 μl of second strand buffer (see Ausubel et al., Current Protocols in Molecular Biology (1999), John Wiley & Sons), 3.6 μl of 10 mM dNTPs, 148.8 μl of H ₂ O, 1.2 were used μl RNaseH (1.5 U / μl, Promega) and 6 μl DNA Polymerase I (New England Biolabs GmbH, Schwalbach, 10 U / μl) were added and the reactions were incubated at 22 ° C. for 2 hours. It was extracted with 100 μl phenol, then with 100 μl chloroform and precipitated with 0.1 vol. Sodium acetate pH 5.2 and 2.5 vol. Ethanol. After centrifugation for 20 minutes at 15,000 g and washing with 70% ethanol, the pellet was dissolved in a restriction mixture consisting of 15 μl 10X universal buffer, 1 μl Mbol and 84 μl H ₂ O and the reaction was incubated at 37 ° C. for 1 hour. It was extracted with phenol, then with chloroform and precipitated with ethanol. The pellet was prepared in a ligation mixture from 0.6 μl lOx ligation buffer (Röche Molecular Biochemicals), 1 μl 10 mM ATP (Röche Molecular Biochemicals), 1 μl linker ML2025 (produced by hybridization of oligonucleotides ML20 (5 -TCACATGCTAAGTCTCGCGA-3 ', SEQ ID NO: 2) and LM25 (5'-GATCTCGC GAGACTTAGCATGTGAC-3 ', SEQ ID NO: 3)), 6.9 μl H ₂ O and 0.5 μl T4 DNA ligase (1 U / μl; Röche Molecular Biochemicals) dissolved and the ligation performed overnight at 16 ° C. The ligation reaction was made up to 100 μl with water, extracted with phenol, then with chloroform and, after addition of 1 μl glycogen (20 mg / ml, Röche Molecular Biochemicals), precipitated with 100 μl 28% 8000 polyethylene glycol (Promega) / 10 mM MgCl ₂ , The pellet was washed with 70% ethanol and taken up in 40 ul water. Example 2: Amplification of cDNA-3 'restriction fragments divided into subpools

For the first round of amplification, PCR batches were prepared, containing 2 μl of the ligation reaction from example 1, 2 μl of 10 × PCR buffer (670 mM Tris-Cl, pH 8.8, 170 mM

(NH _t ) ₂ SO ₄ , 1% (v / v) Tween 20), 1.5 ul 20 mM MgCl ₂ , 0.4 ul 10 mM dNTPs, 2 ul

RediLoad (Invitrogen GmbH, Karlsruhe), 0.2 μl Taq DNA polymerase (Röche Molecular

Biochemicals), 1 μl 4 μM oligonucleotide primer CP31XiX ₂ (5'-ACCTACGTGCAGA _{ττττττττττττττττττχ χ2} _ ₃ ^, _{with X] = Gj A or Cj} χ ₂ = _Q , A, T or C; SEQ ID NO: _4M , 1 μl Oligonucleotide primer ML20 and 9.9 μl water. There were ₁ X ₂ primer comprising performed with all 12 reactions ^ewe ^ ^s one of the 12 possible CP31X) 25 cycles of amplification consisting per sec from the phases of denaturation (30. 94 ° C), annealing (30 sec. 65 ° C) and Extension (2 min. 72 ° C). 5 μl each of the reactions were checked by electrophoresis through a 1.5% agarose gel. The reactions were diluted to 100 ul with water. Further PCR batches were prepared, containing 2 μl diluted amplification reaction, 2 μl 10 × PCR buffer, 1.5 μl 20 mM MgCl ₂ , 0.4 μl 10 mM dNTPs, 2 μl RediLoad, 0.2 μl Taq DNA polymerase, 1 μl 4 μM oligonucleotide primer CP31VNX ₃ X ₄ (5'-ACCTACGTGCAGATTTTTTTTTTTTTTTTTT VNXs _t -3 'with V = mixture of G, A and C, N = mixture of G, A, T and C, X ₃ , X4 = G, A , T or C; SEQ ID NO: 5), 1 μl 4 μM oligonucleotide primer ML20 and 9.9 μl water. Depending on the planned further processing of the batches, Primer ML20 had a fluorescent label (selected from one of the dye sets 5'-FAM, 5 '-JOE, 5'-ROX and 5' -TAMRA [dye set 1] or 5'-FAM , 5'-VIC, 5'-NED and 5'-PET [dye set 2]; further processing of the samples according to Example 3), or ML20 was used unlabelled (further processing of the samples according to Example 4 or Example 5). There were ^ee ö with all 2x192 ils reactions of the 12 possible amplification reactions as well as a dilute the 16 possible CP31VNX ₃ X ^eme ₄ containing primers; 12 x 16 = 192; ML20 marked or unmarked) 25 amplification cycles were carried out, each consisting of the phases denaturation (30 sec. 94 ° C), annealing (30 sec. 65 ° C) and extension (2 min. 72 ° C). Again 5 μl of the reactions were checked by agarose gel electrophoresis. The remaining reaction batches were cleaned using QiaQuick columns (Qiagen AG, Hilden) in accordance with the manufacturer's instructions; elution was carried out in 50 μl water. A photometric determination of the amount was carried out. Example 3: Separation and display of the fluorescence-labeled amplification products by means of capillary gel electrophoresis

2 μl each of the purified fluorescence-labeled amplification products from Example 2 were diluted with 10 μl water and (in the case of using dye set 2 after adding 0.5 μl GeneScan 500 LIZ length standard [Applied Biosystems GmbH, Weiterstadt]) using an ABI prism 3100 Genetic Analyzers (Applied Biosystems) separated by capillary gel electrophoresis. In order to achieve a higher throughput of the device by “multiplexing”, 1 μl of FAM-labeled amplification products, 1 μl of VIC-labeled amplification products, 1 μl of NED-labeled amplification products and 1 μl of PET-labeled were used in further batches

Amplification products mixed, mixed with 0.5 μl LIZ length standard and 7.5 μl water and used for electrophoresis. Multiplexing using dye set 1 was carried out analogously; fragments labeled with FAM, JOE or TAMRA were mixed with the GeneScan 500 ROX length standard. The fluorograms were displayed and evaluated using the GeneScan version 3.7 software for Windows NT (Applied Biosystems To identify differentially expressed genes, fluorograms obtained with RNA preparations from yeast cells in different growth stages, but with the same amplification primers of the first and the second round of amplification, were compared with one another. For this purpose, the fluorograms were matched using GeneScan and visually detected differences in the For comparisons of this type, the GeneScan function "align data by size" was first used to ensure that "matching" fragments from RNA preparations of different growths (ie representing the same gene / transcript) were used stadiums could be assigned to each other. In the next step, the signal strengths were normalized by adapting the average height of the signals of a sample to the average signal strength of a sample to be compared therewith. To identify differentially expressed genes, fragments of identical size and thus identical transcripts, which occur in comparing samples, and whose intensities differ from one another after normalization by at least a preselected factor, including the signature determined, were tabulated; In some cases, the associated values for fragment length (determined using the internal length standard), the signal intensity and the information about the amplification primers used were also included. For general transcriptome analysis (ie an "inventory" of expressed genes), all the signatures determined were tabulated, regardless of relative signal strengths.

Example 4: Determination of terminal bases via ligation

1 μg each of the purified, non-fluorescence-labeled amplification products from Example 2 were mixed with 5 μl 10 × NEBuffer 3 and made up to 49 μl with water. 1 ul Mbόl (5 U / ul, New England Biolabs) was added and incubated for 1 h at 37 ° C; then for 20 min. heat incubated at 65 ° C. The reactions were extracted with TE-saturated phenol, then with chloroform and precipitated with ethanol. The pellets were taken up in 20 μl of a ligation batch, containing 1.2 μl lOx ligation buffer (Röche), 8 μl 0.5 μg / μl Eco57I linker and a linker selected from ECO 1/2 to ECO11 / 12; see. Table 1; Production of the linkers by hybridization of the respectively stated complementary oligonucleotides) and 1 μl T4 DNA ligase (1 U / μl, Röche). It was ligated at 16 ° C overnight. To amplify the ligation products, 2 μl of the ligation mixture were mixed with 2 μl of 10 μM amplification primer 1 (identical to the sequence of the strand of the Eco57l linker whose 3 ′ end had been linked to the fragments cut with Mbol), 2 μl of 10 μM CP31V, 5 μl 10 Advantage 2 buffer (Clontech / BD Biosciences Europe, Heidelberg), 1 μl 10 mM dNTPs, 37 μl water and 1 μl 50x Advantage 2 DNA Polymerase Mix (Clontech) mixed, and it was amplified under the following conditions: 2 min. 94 ° C initial denaturation, then 25 cycles consisting of 20 sec. 94 ° C denaturation, 30 sec. 65 ° C annealing, 2 min. 72 ° C extension. After checking the amplification by means of agarose gel electrophoresis, 10 μl of the amplification products were mixed with 2.5 μl buffer G ⁺ + SAM (Fermentas GmbH, St. Leon-Rot), 0.25 μl 10 mg / ml BSA, 10.65 μl water and 1. 6 ul Eco57I (5 U / ul) mixed. It was incubated at 37 ° C for 1 h, then for 20 min. denatured at 65 ° C. 6.5 μl of this reaction were mixed with 1 μl 20 mM ATP, 2 μl 0.5 μg / μl sequencing adapter SO15NX or SO15XN (cf. Table 2; preparation of the linkers by hybridization of the respectively complementary oligonucleotides indicated) and 0.5 μl T4 DNA ligase (1 U / ul; tubes) mixed and incubated overnight at 16 ° C. The reactions were diluted to 50 μl with water and purified using QiaQuick columns. It was eluted in 25 ul water. 2.5 μl each of the purified amplification batches from Example 2 were diluted with 9.5 μl water and, after adding 0.5 μl GeneScan 500 LIZ length standard (Applied Biosystems GmbH, Weiterstadt), using the ABI 3100 capillary gel electrophoresis separated. For the evaluation, the fluorograms from Example 3 were compared with the associated fluorograms from Example 4. To identify signals in fluorograms compared with one another, which represent the same fragment species, (1) corrections to the fluorophore-specific running behavior and (2) corrections to the fragment shortening increasing from the base to the base determined (by, for example, the length of a fragment in which base 3 and 4 starting from the original Λ & oI recognition site had been converted into a single-stranded overhang by Eco57I cut, had been mathematically corrected by +4 bases and the length of a fragment in which base 5 and 6 starting from the original wl recognition site had been converted into a single-stranded overhang by Eco57l cut, was mathematically corrected by +6 bases). Then all signals belonging to a fragment species (i.e. a fragment occurring in Example 3 and the associated products from Example 4 shortened by means of Eco57l and provided with a sequencing adapter) were assigned to one another and recorded in tabular form, with the respective fluorophore also being used to determine the respective one Base identity was identified. Such a table can, for example, have the format given in Table 3.

The partial cDNA sequences (“signatures”) obtained in this way were used to identify the relevant genes for a BLAST search. The cDNA signature GATCTAGACAACCAAA, which can be seen in Table 3, was used to code the yeast gene KTR4 (ORF YBR199W), coding for a putative Alpha-1,2-mannosyltransferase, Figure 8 shows further examples of signatures obtained from yeast.

Table 1:

Table 2:

Table 3:

Double-stranded portion of the fragment after arithmetic removal of the linker to correct the contribution of the fluorophore to the electrophoretic fragment mobility. The numerical values of this example relate to the use of Eco57l, which generates two-base overhangs for the identification of two neighboring bases ("doublets"), and of sequencing adapters, which optionally identify the first or the second base of such an overhang. To identify several successive doublets the recognition sites for Eco57I in the Eco57I linkers are each offset by two bases, reaction according to example 3 results from the known recognition site of Mbol (cf. example 1) Example 5: Determination of terminal bases via ^ // - / «reaction

1 μg each of the purified, non-fluorescence-labeled amplification products from Example 2 were mixed with 5 μl 10 × NEBuffer 3 and made up to 49 μl with water. 1 ul Mbol (5 U / ul, New England Biolabs) was added and incubated for 1 h at 37 ° C; then for 20 min. heat incubated at 65 ° C. The reactions were extracted with TE-saturated phenol, then with chloroform and precipitated with ethanol. The pellets were taken up in 50 ul Mung Bean Nuclease Buffer (New England Biolabs). After adding 1 .mu.l Mung Bean Nuclease (1U / ul, New England Biolabs) was for 30 min. incubated at 30 ° C. 1 μl of 0.5 M EDTA was added, extracted with phenol, then with chloroform and precipitated with ethanol. In a ligation mixture consisting of 7.5 μl 2x ligation buffer (New England Biolabs), 6.5 μl 0.5 μg / μl RceAI linker, a linker was selected from BCE1 to BCE13; see. Table 1; Production of the linkers by hybridization of the respectively stated complementary oligonucleotides) and 2 μl of Quick T4 DNA ligase (New England Biolabs) dissolved and ligated for 1 h at room temperature. For the amplification of the ligation products, 2 μl of the ligation with 2 μl of 10 μM amplification primer 2 (identical in sequence to that strand of the Bce AI linker whose 3 ′ end had been linked to the fragments cut with Mbol), 2 μl of 10 μM CP31, 5 ul 10X Advantage 2 buffer, 1 ul 10mM dNTPs, 37 ul water and 1 ul 50x Advantage 2 DNA Polymerase Mix were mixed and amplified under the following conditions: 2 min. 94 ° C initial denaturation, then 25 cycles consisting of 20 sec. 94 ° C denaturation, 30 sec. 65 ° C annealing, 2 min. 72 ° C extension. After checking the amplification by means of agarose gel electrophoresis, 10 μl of the amplification products were mixed with 3 μl NEBuffer BceAl (New England Biolabs), 0.3 μl 10 mg / ml BSA, 13.7 μl water and 3 μl BceAl (1 U / μl). It was incubated at 37 ° C. for 4 h, then for 20 min. denatured at 65 ° C. 9 μl of this reaction were mixed with 1 μl ddNTP mix (10 mM FAM-ddATP, JOE-ddTTP, ROX-ddATP and TAMRA-ddCTP, PerkinElmer Life Sciences Inc., Boston) and 0.5 μl Klenow polymerase (5 U / μl, New England Biolabs) mixed and 5 min. incubated at 37 ° C. After stopping with EDTA and heat denaturation for 20 min. at 75 ° C., the mixture was diluted to 50 μl with water and cleaned using QiaQuick columns. Separation, evaluation and analysis of the data were carried out analogously to Example 4.

Claims

claims

A method for analyzing nucleic acid fragments, comprising the steps of: a) providing at least one mixture of those nucleic acid fragments which have at least one recognition site for a restriction endonuclease that cuts outside their recognition site, b) incubation of at least a subset of the mixture of nucleic acid fragments from step (a) with at least one a restriction endonuclease whose interface lies outside its recognition site, c) identification of one or more nucleotides of the cut nucleic acid fragments from (b) and optionally identification of further fragment-specific properties of the cut nucleic acid fragments from (b), these identification (s) being carried out simultaneously for several or for all Nucleic acid fragments are made.

Method for analyzing nucleic acid fragments, comprising the steps:

(a) providing a mixture of nucleic acid fragments which have at least one recognition site for a restriction endonuclease which cuts outside its recognition site,

(b) incubation of at least a subset of the mixture of nucleic acid fragments from step (a) with at least one restriction endonuclease, the interface of which lies outside of its recognition site and which produces overhanging ends of known position and length, but unknown sequence, (c) identification of one or more nucleotides in each case the overhanging ends of the cut nucleic acid fragments from (b) and optionally identifying further fragment-specific properties of the cut

Nucleic acid fragments from (b), these identification (s) being carried out simultaneously for several or for all nucleic acid fragments.

Method according to Claim 1 or 2, characterized in that the cut nucleic acid fragments are additionally separated according to fragment-specific properties as part of the identification in step (c). Method according to Claim 3, characterized in that the cut nucleic acid fragments are separated according to fragment-specific properties by gel electrophoresis.

A method according to claim 4, characterized in that the separation is carried out by capillary electrophoresis.

Method according to one of claims 1 to 5, characterized in that process step (c) comprises the following individual steps (ca) to (cd):

ca) identification of a first nucleotide of the cut nucleic acid fragments from (b), the identification being carried out simultaneously for several or all nucleic acid fragments, cb) optionally identification of a further nucleotide of the cut nucleic acid

Nucleic acid fragments from (b), the identification being carried out simultaneously for several or all nucleic acid fragments, cc) optionally repeating step (cb) until the desired number of nucleotides has been identified, cd) summarizing the steps (ca) to (cc ) obtained sequence information for a selected group or all nucleic acid fragments for fragment-specific signatures, whereby a signature can contain further information about the respective fragment in addition to the sequence information, the nucleotide identification in steps (ca) to (cc) possibly also Separation of the nucleic acid fragments of the mixture includes.

Method according to one of Claims 1 to 6, characterized in that a subset of the mixture of nucleic acid fragments provided in step (a) which differs from the subset to be incubated in step (b) does the following

Process steps (aa) to (ad) are subjected to: aa) separation of the mixture of nucleic acid fragments according to at least one fragment-specific property, ab) optionally detection of the relative frequency of some or all fragments in the mixture separated in (aa), ac) optionally comparison of the information obtained in (aa) and / or (ab) via the composition of different mixtures of nucleic acid fragments from step (a), ad) optionally registration of nucleic acid fragments detected in (ab) which occur in different mixtures of nucleic acid fragments with different relative frequencies,

while another subset selected from group (I) to (III) is treated according to steps (b) and (c), wherein

I) a further subset of the mixture of nucleic acid fragments provided in step (a),

II) a subset of the mixture of nucleic acid fragments provided in step (a) which has previously been separated according to at least one fragment-specific property,

III) is a mixture of nucleic acid fragments that is at least partially identical to (I) or (II).

8. The method according to any one of claims 1 to 7, characterized in that in an additional process step at least one fragment of interest either from the mixture of nucleic acid fragments from (a) or from a mixture of nucleic acid fragments from (a), which previously has been separated according to a fragment-specific property, is isolated. A method according to claim 6, characterized in that in an additional process step at least one fragment of interest either from the mixture of nucleic acid fragments from (a) or "from a mixture of nucleic acid fragments from (a), which previously separated according to a fragment-specific property has been isolated.

A method according to claim 9, characterized in that in the additional

Method step for isolating fragments of fragment-specific oligonucleotide primers is produced using the signatures determined in step (cd) and then for the specific amplification of these fragments by means of

PCR from the mixture of nucleic acid fragments can be used.

Method according to one of claims 6 to 10, characterized in that the signatures obtained in step (cd) of individual nucleic acid fragments of the

Fragment mixture can be used in a database search to identify these fragments.

Method according to one of claims 1 to 11, characterized in that the mixture of nucleic acid fragments from (a) is a mixture of cDNA fragments or a mixture of fragments of genomic DNA.

Method according to one of claims 1 to 12, characterized in that the mixture of nucleic acid fragments from (a) comprises restriction fragments resulting from the incubation of a nucleic acid mixture with at least one

Restriction enzyme have emerged.

A method according to claim 13, characterized in that at least one further subset is provided as a mixture of nucleic acid fragments from (a), which is produced by the following steps:

flanking the restriction fragments of the mixture on both sides with identical or different adapters ii) Hybridization of the fragments from step (i) with in each case different primers, all of which have regions complementary to the adapters from step (i) and which each have at their 3 'end one or more nucleotides which are via the at

The adapters protrude from the complementary region and are complementary to a subset of the fragments from the nucleic acid mixture from (a). iii) sequence-specific extension of the primers from (ii) and, if appropriate, subsequent PCR amplification of the nucleic acid fragments from the fragment mixture which had been extended in a sequence-specific manner in step (ii).

15. The method according to any one of claims 1 to 14, wherein the mixture of nucleic acid fragments from step (a) is provided in that the respective nucleic acid fragments of the fragment mixture to be analyzed are ligated with one or more linkers which have at least one recognition site at at least one specific position for a restriction endonuclease whose interface is outside of its recognition site.

16. The method according to claim 15, characterized in that the respective nucleic acid fragments of the fragment mixture to be analyzed are each ligated with several different linkers which differ from one another in the position of the recognition site for a restriction endonuclease, the interface of which lies outside of its recognition site.

17. The method according to any one of claims 1 to 16, characterized in that the simultaneous identification of several or all nucleic acid fragments of one or more nucleotides of the cut nucleic acid fragments from (b) is carried out by filling in overhanging ends with terminating nucleotides carrying marker groups according to the Sanger sequencing method ,

18. The method according to any one of claims 1 to 16, characterized in that in step c) for several or all nucleic acid fragments simultaneous identification of one or more nucleotides of the cut nucleic acid fragments from (b) via the following steps (cm) to (cp) :

cm) hybridization of one strand of the nucleic acid fragments from (b) with selective oligonucleotide primers, the nucleotide of which is located at the 3 'end or Nucleotides can hybridize with the nucleotide to be sequenced of the respective strand. Cn) Extension of this selective oligonucleotide primer; cp) identification of those selective oligonucleotide primers which were extended in step (cn).

19. The method according to any one of claims 1 to 16, characterized in that the parallel identification of one or more nucleotides of the cut nucleic acid fragments from (b) via the sequence-specific attachment of adapters with overhanging ends of suitable length and type, which is in their

Differentiate overhang.

20. The method according to claim 19, characterized in that the overhanging ends of the adapter used comprise a degenerate portion and a portion with a defined sequence.

21. The method according to claim 19 or 20, characterized in that the adapters used, the overhanging ends of which comprise different portions with a defined sequence, are marked differently.

22. The method according to any one of claims 1 to 21, characterized in that it is used for cataloging nucleic acid signatures.

23. The method according to any one of claims 1 to 21, characterized in that it is used to generate EST banks.

24. The method according to any one of claims 1 or 2, characterized in that it is used to identify genes which are differentially expressed in at least two biological samples.

25. The method according to claim 24, characterized in

that process step a) comprises the following substeps al) to el), al) to el) being as follows: al) providing at least one mixture of nucleic acid fragments, in particular at least one mixture of cDNA fragments, b1) separating the mixture of nucleic acid fragments from al) according to at least one fragment-specific property, cl) optionally detecting the relative frequency of some or all of the fragments in the separated mixture of bl), dl) optionally comparing the information obtained in (bl) and / or (cl) about the composition of different mixtures of nucleic acid fragments from (al) el) optionally registering nucleic acid fragments detected in (dl) which occur in different mixtures of nucleic acid fragments at different relative frequencies;

that process step b) is replaced by process step fl), where fl) is as follows: fl) incubation of a mixture of nucleic acid fragments, selected from group I: a subset of the mixture from (al), group II: that separated into (bl) Mixture of cDNA fragments or a part thereof, Group III: a mixture of nucleic acid fragments which is at least partially identical to the mixture from (a1) or the separated mixture from (b1), but additionally having at least one recognition site for a restriction endonuclease which cuts outside its recognition site , with at least one restriction endonuclease cutting outside its recognition site;

and that process step c) comprises the following sub-steps gl) to kl), where gl) to kl) are as follows:

gl) identification of a first nucleotide of the cut nucleic acid fragments from (fl), the identification being carried out simultaneously for several or all nucleic acid fragments, hl) optionally identifying a further nucleotide of the cut nucleic acid fragments from (fl), the identification being carried out simultaneously for several or all nucleic acid fragments, il) optionally repeating step (hl) until the desired number of nucleotides has been identified, jl) optionally a - or repeated repetition of steps (fl) to (il), the position and / or sequence of the recognition site being varied such that the repetition of steps (fl) to (il) in each case allows the identification of previously unidentified nucleotides, kl) summary of the sequence information obtained in steps (e) to 1) for all nucleic acid fragments or for a selected group of the nucleic acid fragments to form fragment-specific signatures, a signature optionally containing, in addition to the sequence information, further information about the respective fragment;

and that optionally at least one of the optional steps 11) and ml) is additionally carried out, 11) and ml) being as follows:

11) Obtaining fragments of interest from the mixture of nucleic acid fragments from (al) or (bl), the fragments of interest preferably being the fragments registered in (el), ml) Identification of the genes belonging to the nucleic acid fragments of interest, of which the nucleic acid fragments are derived by searching electronic databases, the fragments of interest preferably being the fragments registered in (el).

26. The method according to claim 24, characterized in

that process step a) is replaced by process step a2), where a2) reads as follows:

a2) Provision of at least one mixture of nucleic acid fragments, having a linker and, within the sequence of the linker, having at least one Recognition site for at least one restriction endonuclease that cuts outside its recognition site, that process step b) is replaced by process step b2), b2) being as follows:

b2) incubation of the mixture of nucleic acid fragments from (a2) with the at least one restriction endonuclease from step (a2),

that process step c) comprises the sub-steps c2) to i2), where c2) to i2) are as follows: c2) identification of a first nucleotide of the cut nucleic acid fragments from (b2), the identification being carried out simultaneously for several or all nucleic acid fragments of the mixture and also under The mixture of cut nucleic acid fragments is separated according to at least one fragment-specific property, d2) optionally identifying a further nucleotide of the cut nucleic acid fragments from (b2) according to step (c2), e2) optionally repeating step (d2) until the desired number of nucleotides f2) repetition of steps (a2) to (e2), one or more times if necessary, the position and / or sequence of the recognition site being changed in such a way that its repetition in each case permits identification of previously unidentified nucleotides, g2 ) Summary of the steps (c2 ) to (f2) sequence information obtained for all nucleic acid fragments or for a selected group of the nucleic acid fragments to give fragment-specific signatures, a signature in addition to the

Sequence information can also contain further information about the respective fragment, h2) assignment of the fragmentation according to a specific property in

(c2) fragment-specific information obtained on the signatures obtained in (g2) for the nucleic acid fragments, the fragment-specific

Information in the case of an electrophoretic separation of the fragments, the relative or absolute mobility of the fragments and / or based on a Apparent or actual fragment lengths determined in length standards, and the assignment can be made in tabular and / or computer-readable form, i2) optionally identifying the genes belonging to the nucleic acid fragments, from which the nucleic acid fragments are derived, by searching electronic databases for the signatures from (g2);

and that in addition at least one of steps j2) to p2) is carried out, 11) and ml) being as follows: j2) optionally providing at least one further mixture of nucleic acid fragments, obtained in an analogous manner to the mixture of nucleic acid

Fragments from (a2), in which case the addition of linkers having at least one recognition site for a restriction endonuclease that cuts outside their recognition site can be omitted, k2) separation of the mixture of nucleic acid fragments from Q2) according to a fragment-specific property,

12) Assignment of the fragment-specific information obtained in the separation according to a fragment-specific property in (k2) to the individual separated fragments, m2) if necessary, comparison of the relative or absolute frequencies of at least some of the fragments separated in (k2) with the relative or absolute

Frequencies of the respective homologous fragments originating from other nucleic acid fragment mixtures, n2) optionally registering those fragments whose relative or absolute frequency differs from the relative or absolute frequency of their homologous fragments originating from other nucleic acid fragment mixtures, o2) optionally assigning the in (n2) registered fragments of those genes or transcripts from which said registered fragments are derived, p2) optionally obtaining the fragments registered in (n2) from the mixture of nucleic acid fragments from (a2) or (i2) and / or) , wherein steps (i2) to (n2) can also be carried out before steps (a2) to (h2).