DE10356783A1

DE10356783A1 - Method for multiplex sequencing

Info

Publication number: DE10356783A1
Application number: DE10356783A
Authority: DE
Inventors: Diethard Prof. Dr. Tautz; Alexander Pozhitkov
Original assignee: Universitaet zu Koeln
Current assignee: Universitaet zu Koeln
Priority date: 2003-12-02
Filing date: 2003-12-02
Publication date: 2005-07-07
Also published as: WO2005054504A1

Abstract

Ein Verfahren zur Multiplex Sequenzierung wird offenbart, mit dem in einem einzelnen Reaktionsgefäß komplexe Gemische von Nukleinsäuren sequenziert werden können.A method for multiplex sequencing is disclosed, with which complex mixtures of nucleic acids can be sequenced in a single reaction vessel.

Description

Ein Verfahren zur Multiplex Sequenzierung wird offenbart, mit dem in einem einzelnen Reaktionsgefäß komplexe Gemische von Nukleinsäuren sequenziert werden können.One Method for multiplex sequencing is disclosed with the in a single reaction vessel complex Mixtures of nucleic acids can be sequenced.

Hintergrund der ErfindungBackground of the invention

Die Identifizierung eines Organismus oder Teile eines Organismus wie von Plasmiden oder Transposons in einem Mikroorganismus ist für verschiedene Bereiche der Wissenschaften ein erster wesentlicher Schritt, der die weitere Vorgehensweise bestimmt. Beispielsweise erlaubt erst die medizinische mikrobiologische Diagnostik, d. h. die Identifizierung des verursachenden Pathogens, eine pathogenspezifische Therapie, die oftmals weniger belastend für den Patienten ist. Auch medizinische epidemiologische Überwachungen beispielsweise von Organismen mit Antibiotikaresistenz erfordern die Identifikation von Organismen mit bestimmten Plasmiden. Weiterhin erlaubt die umweltbiologische Diagnostik, wie sie zur Beurteilung der Gewässergüte durchgeführt wird, Rückschlüsse auf Kontaminationsquellen. In der Lebensmitteldiagnostik können ebenfalls durch die konkrete Identifizierung von kontaminierenden Organismen Kontaminationsquellen eingekreist werden; zur Beurteilung von Habitaten, auf dem Land und im Wasser müssen ebenfalls die vorkommenden Organismen identifiziert und quantifiziert werden. Ein weiteres Einsatzfeld ist die Bestimmung von exprimierten Genen in normalen oder krankhaft veränderten Geweben, bei denen durch die quantitative und qualitative Analyse exprimierter messenger RNA (mRNA) auf die Ursache und den Ablauf der Krankheit geschlossen werden kann.The Identification of an organism or parts of an organism such as of plasmids or transposons in a microorganism is different for Areas of the sciences a first essential step that the further procedure is determined. For example, only allowed medical microbiological diagnostics, d. H. the identification causative pathogen, a pathogen-specific therapy that often less stressful for the patient is. Also medical epidemiological surveillance For example, require organisms with antibiotic resistance the identification of organisms with certain plasmids. Farther allows the environmental biology diagnostics, as for the assessment the quality of the water is carried out, Conclusions on Sources of contamination. In the food diagnostics can also by the concrete identification of contaminating organisms sources of contamination to be circled; for the assessment of habitats, in the countryside and in the water also identified and quantified the occurring organisms become. Another field of application is the determination of expressed Genes in normal or abnormally altered tissues in which through the quantitative and qualitative analysis of expressed messenger RNA (mRNA) is closed to the cause and the course of the disease can be.

Eine möglichst schnelle und genaue Identifizierung ist wünschenswert in den oben beispielhaft genannten Anwendungsbereichen. Neben der Kultivierung von Organismen, insbesondere Mikroorganismen, bieten sich hierzu molekularbiologische Verfahren wie Hybridisierungsverfahren, z. B. Microarrays, auf PCR basierende Amplifikations- und Signalverstärkungsverfahren und Sequenzierung an.A preferably Fast and accurate identification is desirable in the above example mentioned application areas. In addition to the cultivation of organisms, In particular, microorganisms, offer this molecular biological Methods such as hybridization methods, e.g. As microarrays, based on PCR Amplification and signal amplification techniques and sequencing.

Durch die Kultivierung von Mikroorganismen wird oftmals nur ein unvollständiges Bild aller in einer Probe enthaltenen Mikroorganismen gewonnen, da die Kultivierungsbedingungen nicht optimal für alle präsenten Organismen sind. Diese Methode ist mit verschiedenen Nachteilen behaftet und somit ungeeignet für Routineidentifikationen.By The cultivation of microorganisms often becomes an incomplete picture all microorganisms contained in a sample, since the Cultivation conditions are not optimal for all organisms present. These Method has several disadvantages and is therefore unsuitable for routine identification.

Im Vergleich zu der Kultivierung von Mikroorganismen sind die auf Nukleinsäurehybridisierung beruhenden Verfahren schneller, genauer und umfassender. Man unterscheidet zwischen quantitativen „slot-" oder „dot-blot"-Hybridisierungen, wobei die isolierte gesamte DNA oder RNA eines Organismus mit einer Sonde oder einer Vielzahl an Sonden, die auf einem Träger fixiert sind, hybridisiert wird. Alternativ dazu werden ganze Zellen in situ hybridisiert. Diese Verfahren sind der Kultivierung von Organismen in vielen Aspekten überlegen. Dennoch sind diese Verfahren nicht optimal, da sie zum Teil kostenaufwendig sind, insbesondere für die Microarray Verfahren, die darüber hinaus auch mit dem Problem der Kreuzhybridisierung belastet sind (die Kreuzhybridisierung stellt ein besonders großes Problem dar bei der Analyse von nah verwandten Organismen.)in the Compared to the cultivation of microorganisms are those on nucleic acid hybridization based procedures faster, more accurate and more comprehensive. One differentiates between quantitative "slot" or "dot-blot" hybridizations, wherein the isolated whole DNA or RNA of an organism with a Probe or a variety of probes fixed on a support are hybridized. Alternatively, whole cells in hybridized in situ. These methods are the cultivation of organisms superior in many aspects. Yet These methods are not optimal as they are sometimes costly are, in particular for The microarray method, in addition, also with the problem cross-hybridization (cross hybridization poses a very big one Problem with the analysis of closely related organisms.)

Auf PCR (Saiki et al. Science 239, 487 (1988)) basierende Verfahren gehören mittlerweile zu den Standardverfahren in der molekularen Diagnostik. Durchgesetzt haben sich diese Verfahren insbesondere wegen ihrer Schnelligkeit, relativ einfachen Handhabung und relativ einfachen Analyse der Ergebnisse. Man unterscheidet zwischen qualitativen und quantitativen PCR-Verfahren. Insbesondere die qualitativen PCR-Verfahren überzeugen wegen ihrer Einfachheit und Kosteneffizienz, leiden aber an dem Vorkommen sogenannter „falsch/negativer" oder „falsch/positiver" Ergebnisse. Dies erfordert, dass ein mittels qualitativer PCR erhaltenes erstes Ergebnis durch weitere Testverfahren bestätigt werden muss, was wiederum zu vermehrten Kosten führt. Bei sowohl qualitativen wie auch quantitativen PCR-Verfahren sind die ersten Amplifikationsrunden für die Genauigkeit und Zuverlässigkeit der Ergebnisse entscheidend. Sind die Primer schlecht gewählt, sind der Puffer oder die Temperatur suboptimal oder stehen Enzyme oder NTPs nicht ausreichend zur Verfügung, verläuft die PCR-Reaktion nicht optimal, und die Ergebnisse werden fragwürdig und/oder unklar, insbesondere durch das Auftreten von unechten Amplifikationsprodukten. Die Anzahl der potentiellen Fehlerquellen vergrößert sich noch bei sogenannten Multiplex PCR- Verfahren (z. B. WO 01/88174), wobei simultan verschiedene Ziel Sequenzen in einer einzigen PCR-Reaktion amplifiziert werden. Dafür müssen mehrere Primerpaare eingesetzt werden, was mit der Bildung von Primerdimeren einhergeht und zu unechten Amplifikationsprodukten führt. Auf PCR basierende Verfahren alleine erlauben also nicht immer die vorbehaltslose Identifikation von Organismen oder Teilen von Organismen.On PCR (Saiki et al., Science 239: 487 (1988)) belong meanwhile standard methods in molecular diagnostics. These methods have been enforced, especially because of their Speed, relatively easy handling and relatively simple Analysis of the results. One differentiates between qualitative and quantitative PCR methods. In particular, the qualitative PCR methods convince because of their simplicity and cost-effectiveness, but suffer from that Occurrence of so-called "false / negative" or "false / positive" results. This requires that a first result obtained by means of qualitative PCR confirmed by further test procedures which in turn leads to increased costs. In both qualitative as well as quantitative PCR methods are the first amplification rounds for the Accuracy and reliability the results are crucial. If the primers are poorly chosen, they are the buffer or temperature is suboptimal or stand enzymes or NTPs are not sufficiently available extends the PCR reaction is not optimal, and the results become questionable and / or unclear, in particular due to the appearance of spurious amplification products. The number of potential sources of error still increases with so-called Multiplex PCR procedure (eg, WO 01/88174), wherein simultaneously different target sequences be amplified in a single PCR reaction. There must be several Primer pairs are used, suggesting the formation of primer dimers and leads to spurious amplification products. On PCR-based procedures alone do not always allow the unconditional Identification of organisms or parts of organisms.

Den „Goldstandard" für die akkurate Identifizierung von Organismen setzt alleine die Sequenzierung des genetischen Materials eines Organismus. Dabei wird die Nukleinsäuresequenz eines Teils des genetischen Materials eines Organismus bestimmt. Seit 1975 wurden mehrere Methoden zur raschen Sequenzierung langer DNA-Abschnitte entwickelt. Dabei sind die Methoden der chemischen DNA-Spaltung, entwickelt von Allan Maxam and Walter Gilbert (Maxam, A. M., Gilbert, W., Methods in Enzymology, Vol. LXV, 499 – 560, Academic Press, New York(1980)), und das Kettenabbruchsverfahren von Frederick Sanger (Sanger et al. PNAS 74, 5463 (1977)) die Pioniermethoden, die vielerlei Abwandlungen erfahren haben (E.D. Hyman, Anal. Biochem. 174, 423 (1988); A. Rosenthal, US 4,849,077 ; M.L. Metzker et al., Nucleic Acids. Res. 22, 4259 (1994); D.H. Jones, Biotechniques 22, 938 (1997)).The only "gold standard" for the accurate identification of organisms is sequencing the genetic material of an organism. The nucleic acid sequence of a part of the genetic material of an organism is determined. Since 1975, several methods have been developed for the rapid sequencing of long DNA segments. The methods of chemical DNA cleavage developed by Allan Maxam and Walter Gilbert (Maxam, AM, Gilbert, W., Methods in Enzymology, Vol. LXV, 499-560, Academic Press, New York (1980)), and The chain termination method of Frederick Sanger (Sanger et al PNAS 74, 5463 (1977)) describes the pioneering methods that have undergone many modifications (ED Hyman, Anal Biochem 174, 423 (1988); A. Rosenthal, US 4,849,077 ; ML Metzker et al., Nucleic Acids. Res. 22, 4259 (1994); DH Jones, Biotechniques 22, 938 (1997)).

Bei dem Kettenabbruchsverfahren kommen verschiedene Polymerasen zum Einsatz; ursprünglich wurde das Enzym DNA Polymerase I aus E. coli verwendet, mittlerweile gibt es modifizierte/optimierte E. coli DNA Polymerase I und auch thermostabile Polymerasen, die geeignet sind, von der zu sequenzierenden einzelsträngigen DNA eine komplementäre Kopie zu generieren. Dabei müssen ein geeigneter Primer, die vier Desoxyribonukleoisdtriphosphate (dNTP) und eine kleine Menge eines Kettenabbruchmoleküls wie 2', 3' Didesoxyribonucleosidtriphosphat (ddNTP) Analoga zur Verfügung stehen sowie die neu entstehenden kopierten Fragmente auf irgendeine Art und Weise, z. B. radioaktiv oder fluoreszierende Gruppen, markiert werden. Bei diesem Verfahren müssen je zu sequenzierender Sequenz vier Sequenzierungsreaktionen angesetzt werden, wobei jedes Reaktionsgemisch alle vier dNTPS enthält und jeweils eine Art von ddNTP. Durch die Verwendung geringer Mengen an Kettenabbruchsmolekülen, ddNTPs, wird ein Satz verkürzter Ketten synthetisiert, die jeweils an der Stelle abbrechen, an der ein Kettenabbruchsmolekül anstelle eines normalen dNTPs eingebaut wur de. Durch die statistische Inkorporation der ddNTPs in die wachsende Kette entstehen DNA-Kettenfragmente, die sich jeweils in ihrer Länge um eine Nukleotid unterscheiden. Mittels Elektrophorese, z. B. auf Polyacrylamid Gelen (die dann Sequenzierungsgele genannt werden), werden die Fragmente ihrer Länge nach getrennt. Dabei werden die vier Reaktionsgemische auch jeweils in eine eigene Spur aufgetragen, so dass sich je zu sequenzierenden DNA-Molekül vier Spuren auf dem Sequenzierungsgel befinden.at The chain termination method involves various polymerases Commitment; originally became the enzyme DNA polymerase I from E. coli used, meanwhile gives it modified / optimized E. coli DNA polymerase I and also thermostable Polymerases that are suitable from the single-stranded DNA to be sequenced a complementary one Generate a copy. It must a suitable primer containing four deoxyribonucleic triphosphates (dNTP) and a small amount of a chain termination molecule such as 2 ', 3' dideoxyribonucleoside triphosphate (ddNTP) analogues available stand as well as the newly formed copied fragments on any Way, z. As radioactive or fluorescent groups marked become. In this procedure must each sequence to be sequenced four sequencing reactions with each reaction mixture containing all four dNTPS and each a kind of ddNTP. By using small amounts of chain termination molecules, ddNTPs, becomes a sentence shortened Chains synthesized, each breaking off at the site at the a chain termination molecule was installed instead of a normal dNTP. By the statistical Incorporation of the ddNTPs into the growing chain produces DNA chain fragments, each in their length to distinguish one nucleotide. By electrophoresis, z. B. on polyacrylamide Gels (called sequencing gels) become the fragments their length separated. The four reaction mixtures are also each in applied a separate track, so that each to be sequenced DNA molecule four tracks are located on the sequencing gel.

Heute benutzte automatisierte DNA-Sequenzierungsverfahren wie das „Cycle Sequencing" (Murray, V. Nucleic Acids Res. 17, 8889 (1989) verwenden mit fluoreszierenden Gruppen markierte ddNTPs („dye terminators") (WO 9720949), wobei während der elektophoretischen Trennung die markierten Fragmente sofort detektiert werden und das eingebaute Nukleotid erkannt wird, was das direkte Ablesen der Sequenz erlaubt (L.M. Smith et al. Nature 321, 674 (1986); W. Ansorge et al., J. Biochem. Biophys. Meth. 13, 315 (1986)). Durch die Verwendung von unterschiedlich Fluoreszenz markierten ddNTPs wird es dann möglich alle vier Einzelsequenzreaktionen in einer einzelnen Gelspur, oder Elektrophorese-Kapillare nachzuweisen. Zudem wird durch das Cycle Sequencing auch eine Amplifizierung des Signals erreicht, was es erlaubt die Konzentration der zu sequenzierenden Matrizen DNA zu reduzieren.today used automated DNA sequencing methods such as the "Cycle Sequencing "(Murray, V. Nucleic Acids Res. 17, 8889 (1989) use with fluorescent Groups labeled ddNTPs ("dye terminators ") (WO 9720949), wherein during the electrophoretic separation immediately detects the labeled fragments and the built-in nucleotide is detected, which is the direct Reading the sequence allowed (L.M. Smith et al., Nature 321, 674 (1986); W. Ansorge et al., J. Biochem. Biophys. Meth. 13, 315 (1986)). By the Use of differently fluorescently labeled ddNTPs becomes it then possible all four single-sequence reactions in a single gel trace, or Electrophoresis capillary demonstrated. In addition, cycle sequencing also involves amplification the signal reaches what allows the concentration of the sequenced Matrices reduce DNA.

Eine relativ neues Sequenzierverfahren ist die Pyrosequenzierung (M. Ronaghi et al. Anal. Biochem. 242, 84 (1996); WO 93/23564; WO 89/09283). Dieses Verfahren bedarf nicht mehr der ddNTPS. Statt dessen wird die Pyrophosphat-(PPi)-Freisetzung während des Polymerase-katalysierten dNTPs-Einbaus (Kettenverlängerung) indirekt gemessen. Das freigesetzte PPi wird durch eine Sulfyrase in ATP umgewandelt, und das ATP dient dann als Substrat für eine Leuchtkäfer (Photinus pyralis) Luciferase, wodurch ein sichtbares/messbares Lichtsignal entsteht. Das Lichtsignal ist z. B. messbar durch eine PMT-Einheit und wird in einem „pyogram^TM" dargestellt. Bei der Pyrosequenzierung kommen mindestens drei Enzyme zum Einsatz, nämlich eine Polymerase, eine Sulfyrase und eine Luciferase, und – wie auch bei den anderen Sequenzierungsverfahren – eine Matrize, ein Primer und dNTPs. Bei abgewandelten (Ronaghi et al. Science 281, 5375 (1998); WO 98/28440), automatisierten Pyrosequenzierungsverfahren (siehe www.pyrosequencing.com) ist noch zusätzlich ein Nukleotid-degradierendes En zym, eine Apyrase, im Reaktionsgemisch vorhanden, die kontinuierlich nicht eingebaute Nukleotide abbaut. Weiterhin kann dATP durch ein alpha-thio Triphosphate (dATPαS) ersetzt sein, da dieses Nukleotid zwar von der DNA Polymerase effizient genutzt wird, nicht aber von der Luciferase.A relatively recent sequencing method is pyrosequencing (M. Ronaghi et al., Anal. Biochem., 242, 84 (1996), WO 93/23564, WO 89/09283). This procedure no longer requires the ddNTPS. Instead, pyrophosphate (PPi) release is indirectly measured during polymerase-catalyzed dNTPs incorporation (chain extension). The released PPi is converted to ATP by a sulfyrase and the ATP then serves as a substrate for a firefly (Photinus pyralis) luciferase, producing a visible / measurable light signal. The light signal is z. Is measurable by a PMT unit and is represented in a "pyogram ^™ ." Pyrosequencing employs at least three enzymes, namely a polymerase, a sulfyrase, and a luciferase, and, as in the other sequencing methods, a template. In modified (Ronaghi et al., Science 281, 5375 (1998), WO 98/28440), automated pyrosequencing method (see www.pyrosequencing.com) is additionally a nucleotide-degrading enzyme, an apyrase, im In addition, dATP can be replaced by an alpha-thio triphosphate (dATPαS), since this nucleotide is efficiently used by the DNA polymerase, but not by the luciferase.

Bei der Pyrosequenzierung wird in einem ersten Schritt ein Primer an eine einzelsträngige, ggf. PCR-amplifizierte DNA-Matrize angelagert und zusammen mit den Enzymen (DNA-Polymerase, ATP-Sulfyrase, Luciferase und Apyrase) und den Substarten Adenosin 5' phosphosulfate (APS) und Luciferin inkubiert. In einem zweiten Schritt werden die vier dNTPS zugegeben, wobei die Mixtur statt dATP dATPαS enthält. Die DNA-Polymerase katalysiert den Einbau eines dNTPs, der komplementär zu der Base des Matrizenstranges ist. Dabei wird PPi equimolar zu der eingebauten Menge an dem Nukleotid freigesetzt. In einem dritten Schritt, wandelt in Anwesenheit von APS die ATP-Sulfyrase das PPi in ATP um. Das ATP wiederum dient als Substrat der Luciferase und liefert die Energie für die Umwandlung des Luciferins zu Oxyluciferin wobei ein Lichtsignal entsteht und dessen Intensität proportional zu der Menge an ATP ist. Das Licht wir mit einer PMT-Einheit detektiert und ist als Zacken (peak) in einem Pyrogram^TM dargestellt. Jedes Lichtsignal ist proportional zu der Anzahl an eingebauten Nukleotiden. In einem vierten Schritt baut das Enzym Apyrase die nicht-eingebauten Nukleotide und überschüssiges ATP ab. Erst nach vollständigem Abbau aller Nukleotide wird ein neues dNTP der Reaktion zugeführt. Die Zugabe eines neuen Nukleotids ist der fünfte Schritt in diesem Verfahren, wobei jedes Nukleotid einzeln zugeführt wird, um bei Einbau sofort das Lichtsignal dem zugeführten Nukleotid zuordnen zu können.In pyrosequencing, in a first step, a primer is attached to a single-stranded, optionally PCR-amplified DNA template and together with the enzymes (DNA polymerase, ATP sulfyrase, luciferase and apyrase) and the substrates adenosine 5 'phosphosulfate (APS ) and luciferin. In a second step, the four dNTPS are added with the mixture containing dATPαS instead of dATP. The DNA polymerase catalyzes the incorporation of a dNTP that is complementary to the base of the template strand. Here, PPi is released equimolar to the incorporated amount at the nucleotide. In a third step, in the presence of APS, the ATP sulfyrase converts the PPi to ATP. The ATP, in turn, serves as the luciferase substrate and provides the energy for the conversion of luciferin to oxyluciferin, producing a light signal and its intensity proportional to the amount of ATP. The light is detected with a PMT unit and is shown as a peak in a Pyrogram ^TM . Each light signal is proportional to the on number of incorporated nucleotides. In a fourth step, the enzyme apyrase degrades the unincorporated nucleotides and excess ATP. Only after complete removal of all nucleotides, a new dNTP is added to the reaction. The addition of a new nucleotide is the fifth step in this procedure, with each nucleotide being added separately to allow immediate assignment of the light signal to the added nucleotide upon incorporation.

Eine Vielzahl verschiedener Sequenziertechniken stehen dem Fachmann zur Verfügung. Allen gleich ist bislang, dass die zu sequenzierende Sequenz isoliert und aufgereinigt werden muss, bevor mit der eigentlichen Sequenzierung begonnen werden kann. Die Sequenzierung von Gemischen von verschiedenen Nukleinsäuren, was allgemein als Multiplex-Sequenzierung beschrieben wird, erfordert andere Verfahren.A Numerous different sequencing techniques are available to the person skilled in the art Available. All the same so far is that the sequence to be sequenced isolated and must be purified before using the actual sequencing can be started. The sequencing of mixtures of different nucleic acids, what is commonly described as multiplex sequencing requires other procedures.

Ein Schritt in Richtung Multiplex-Sequenzierung ist in WO 02/04674 beschrieben. Hierin wird ein Verfahren zur automatischen Transposon-vermittelten Multiplex- DNA-Sequenzierung (TEMS) von in Vektoren befindlichen DNA-Fragmenten beschrieben. Charakteristisch für dieses Verfahren ist als erster Verfahrensschritt das Mischen („Pooling") einer großen Anzahl der Ziel-DNA tragenden Vektoren in einem Reaktionsgefäß und die zufällige Integration von Transposons in diese Vektoren. Die Transposons enthalten Sequenzen die komplementär sind zu den zu benutzenden Primern der Sequenzierungsreaktion. Zweiter Verfahrensschritt sind PCR-Reaktionen zur Identifikation solcher positiver Vektoren, die eine Transposon in die Ziel-DNA integriert haben. Als dritter und finaler Verfahrensschritt folgt die Sequenzierung einzelner positiver Vektoren, wobei für jeden Vektor individuelle Sequenzierreaktionen angesetzt werden müssen. Dieses Verfahren ist somit kein Multiplex Sequenzierverfahren im eigentlichen Sinne, denn es werden keine Gemische von Nukleinsäuren sequenziert, sondern aufgereinigte Vektor-DNA.One Step towards multiplex sequencing is described in WO 02/04674. Herein, a method for automatic transposon-mediated Multiplex DNA sequencing (TEMS) of vector DNA fragments. Characteristic of this method is, as a first method step, the "pooling" of a large number the target DNA-carrying vectors in a reaction vessel and the random Integration of transposons into these vectors. The transposons contain Sequences the complementary are to be used primers of the sequencing reaction. second Process step are PCR reactions to identify such positive vectors that have integrated a transposon into the target DNA. The third and final step is followed by sequencing individual positive vectors, with individual ones for each vector Sequencing reactions must be recognized. This procedure is thus no multiplex sequencing method in the true sense, because no mixtures of nucleic acids are sequenced, but purified Vector DNA.

WO 03/056030 und die dazugehörige wissenschaftliche Veröffentlichung von K. Murpy and J.R. Eshleman (American Journal of Pathology 161, 27 (2002)) beschreiben ein „echtes" Multiplex Sequenzierungsverfahren; dieses Verfahren erlaubt die Sequenzierung und Analyse verschiedene DNAs, insbesondere verschiedene PCR-Fragmente, innerhalb eines Reaktionsgefäßes. Das Verfahren basiert auf dem Einsatz mehrerer besonderer Primer, wovon alle außer einem sich durch das Vorkommen von langen Bereichen nicht zur Matrize komplementärer Basen auszeichnen; diese Primer sind dadurch ungewöhnlich lang. Die entstehenden Produkte unterscheiden sich entsprechend der Primer, mit denen sie sequenziert wurden, und eine Längenzuordnung wird so möglich. Das in WO 03/056030 beschriebene Verfahren bedarf besonderer Primer, die das gesamte Verfahren kostspielig machen. Zudem ist es auf eine geringe Anzahl parallel zu sequenzierender Matrizen beschränkt.WHERE 03/056030 and the associated scientific publication by K. Murpy and J.R. Eshleman (American Journal of Pathology 161, 27 (2002)) describe a "true" multiplex sequencing method; This method allows sequencing and analysis different DNAs, in particular different PCR fragments, within a reaction vessel. The Method is based on the use of several special primers, of which all except not affected by the occurrence of long areas to die complementary Distinguish bases; These primers are unusually long. The resulting products differ according to the primer, with which they were sequenced, and a length assignment becomes possible. The In WO 03/056030 described method requires special primer, that make the whole process costly. It is also on one limited number of templates to be sequenced in parallel.

Zusammenfassung der ErfindungSummary of the invention

Es wurde gefunden, dass komplexe Gemische bekannter Polynukleinsäuren durch Sequenzierungsreaktion und nachfolgendem Abgleich des Sequenzierungsspektrums mit den korrespondierenden Einzelreaktionen qualitativ und quantitativ bestimmt werden können. Es bedarf keiner besonderen Reagenzien, und es können z. B. Standard-Oligonukleotidprimer eingesetzt werden. Dies hat den Vorteil, dass diese Methode in jedem Labor einfach durchzuführen ist, ohne dass besondere Anschaffungen gemacht werden müssen. Routinemethoden kommen zum Einsatz.It has been found that complex mixtures of known polynucleic acids by Sequencing reaction and subsequent alignment of the sequencing spectrum with the corresponding individual reactions qualitatively and quantitatively can be determined. It requires no special reagents, and it may, for. B. standard oligonucleotide primer be used. This has the advantage of having this method in every one Easy to perform laboratory is without having to make any special purchases. routine methods come used.

Die Erfindung betrifft

(1) Verfahren zur Analyse komplexer Gemische von bekannten Polynukleinsäuren, umfassend die folgenden Schritte: (a) Bereitstellen eines Sequenzierreaktionsgemisches, umfassend das zu analysierende komplexe Gemisch von bekannten Polynukleinsäuren, (b) Zufügen eines oder mehrerer Enzyme und wenigstens eines Primers zu dem Gemisch (a), der wenigstens einen Sequenzabschnitt aufweist, der zu einer oder mehreren Sequenzen der bekannten Polynukleinsäuren komplementär ist, (c) Sequenzierung des Gemisches (a), wobei gleichzeitig oder nach Beendigung der Sequenzierreaktion (d) ein gemeinsames Signalspektrum für alle Sequenzierprodukte aufgenommen wird, (e) ein Abgleich/Vergleich des erfassten Signalspektrums mit den Sequenzen und/oder den Signalspektren der bekannten Polynukleotidequenzen erfolgt und (f) die einzelnen Polynukleinsäuren des Gemischs identifiziert und quantifiziert werden und
(2) einen Kit zur Durchführung des Verfahrens (1).

The invention relates

(1) A method of analyzing complex mixtures of known polynucleic acids, comprising the steps of: (a) providing a sequencing reaction mixture comprising the complex mixture of known polynucleic acids to be analyzed, (b) adding one or more enzymes and at least one primer to the mixture ( a) having at least one sequence portion which is complementary to one or more sequences of the known polynucleic acids, (c) sequencing the mixture (a), simultaneously or after completion of the sequencing reaction (d) a common signal spectrum is recorded for all Sequenzierprodukte, (e) matching the detected signal spectrum with the sequences and / or signal spectra of known polynucleotide sequences, and (f) identifying and quantifying the individual polynucleic acids of the mixture, and
(2) a kit for carrying out the method (1).

Das erfindungsgemäße Verfahren verwendet nur typische Sequenzierungsprimer und erlaubt es prinzipiell mehrere hundert unterschiedlicher Matrizen parallel zu sequenzieren. Mit möglichen technischen Fortschritten in der Sequenzierungstechnologie sind sogar noch größere Matrizenzahlen denkbar.The method according to the invention uses only typical sequencing primers and, in principle, allows several hundred different templates to be sequenced in parallel. With possible technical fort In sequencing technology even larger matriculation numbers are conceivable.

Kurzbeschreibung der FigurenBrief description of the figures

1 zeigt eine Bibliothek von Pyrogrammen für die Spezies T, E, C, A, H, N4 und O, die im experimentellen Beispiel erstellt wurden. 1 Figure 4 shows a library of pyrograms for species T, E, C, A, H, N4, and O prepared in the Experimental Example.

2 zeigt Pyrogramme der Gemische 1 – 3, die in dem experimentellen Beispiel erstellt wurden. 2 shows pyrograms of mixtures 1-3, which were prepared in the experimental example.

Detaillierte Beschreibung der ErfindungDetailed description the invention

In dem erfindungsgemäßen Verfahren besteht das komplexe Gemisch an Polynukleinsäuren aus mindestens zwei unterschiedlichen Polynukleinsäuren. Diese unterscheiden sich vorzugsweise an wenigstens zwei, mehr bevorzugt an wenigstens fünf, sechs, sieben, acht, neun oder zehn Sequenzpositionen, wobei die Unterschiede entweder von einem Nukleotidaustausch oder durch Insertion oder Deletion eines oder mehrerer Nukleotide herrühren.In the method according to the invention the complex mixture of polynucleic acids consists of at least two different ones Polynucleic. These preferably differ by at least two, more preferably at least five, six, seven, eight, nine or ten sequence positions, with the differences either from a nucleotide exchange or by insertion or Deletion of one or more nucleotides.

Gemäß dem erfindungsgemäßen Verfahren umfassen die komplexen Nukleinsäuregemische DNA-Moleküle, RNA-Moleküle, Gemische oder Derivate derselben. Die Enzyme sind vorzugsweise aus DNA Polymerasen, reversen Transkriptasen, sowie Hilfsenzymen für die Quantifizierung der Nachweisreaktion ausgewählt. In dem erfindungsgemäßen Verfahren sind von den bekannten nachzuweisenden Einzelsequenzen in der Regel Sequenzprofile zum Abgleich erstellt worden. Zudem können dem Sequenzreaktionsgemisch weiterhin Standards von bekannten Einzelsequenzen zugegeben werden, für die eine Bibliothek einzelner Sequenzprofile erstellt worden ist.According to the method of the invention For example, the complex nucleic acid mixtures include DNA molecules, RNA molecules, mixtures or derivatives thereof. The enzymes are preferably of DNA polymerases, reverse transcriptases, as well as auxiliary enzymes for the quantification of the detection reaction selected. In the method according to the invention are of the known single sequences to be detected in the rule Sequence profiles have been created for comparison. In addition, the Sequence reaction mixture further standards of known single sequences be admitted for a library of individual sequence profiles has been created.

In Schritt (b) werden dem Reaktionsgemisch weiterhin Nukleotide oder Nukleotidderivate wie dNTPs und ddNTPs zugegeben. In einer bevorzugten Ausführungsform des Verfahrens besteht das Sequenzreaktionsgemisch aus Polynukleinsäuren, die direkt aus Polynukleinsäurehaltigen Organismen oder Geweben isoliert sind.In Step (b), the reaction mixture further nucleotides or Nucleotide derivatives such as dNTPs and ddNTPs are added. In a preferred embodiment the method consists of the sequence reaction mixture of polynucleic acids, the directly from polynucleic acid-containing Organisms or tissues are isolated.

In dem erfindungsgemäßen Verfahren erfolgt die Sequenzierung durch ein Verfahren, ausgewählt aus Pyrosequenzierung, Sequenzierung mit dideoxy-Nukleotiden von einzelsträngigen Matrizen, Cycle-Sequencing mit dideoxy-Nukleotiden von einzel- oder doppelsträngigen Matrizen, oder weiteren enzymatischen oder chemischen Sequenzierungsmethoden, wobei die Sequenzierungsreaktionen zu reproduzierbar quantifizierbaren Nachweisen der einzelnen sequenzierten Nukleotidpositionen führt. Für Details dieser Sequenzierverfahren wird auf die in der Einleitung genannte Literatur verwiesen.In the method according to the invention the sequencing is carried out by a method selected from pyrosequencing, Sequencing with dideoxy nucleotides from single-stranded Matrices, cycle sequencing with dideoxy nucleotides of single- or double-stranded matrices, or other enzymatic or chemical sequencing methods, the sequencing reactions becoming reproducibly quantifiable Detecting the individual sequenced nucleotide positions leads. For details This sequencing method is referred to in the introduction Referenced literature.

In dem erfindungsgemäßen Verfahren wird vorzugsweise wenigstens ein Primer, der eine Länge von mindestens 12, vorzugsweise 18 bis 25 Nukleotiden aufweist, und einen oder mehrere Sequenzabschnitte aufweist, die komplementär zu Abschnitten aus bekannten Polynukleinsäuren sind, eingesetzt.In the method according to the invention Preferably, at least one primer having a length of at least 12, preferably 18 to 25 nucleotides, and one or has multiple sequence sections that are complementary to sections from known polynucleic acids are used.

In einer Variante des erfindungsgemäßen Verfahrens wird das zu analysierende Gemisch vor der Sequenzierungsreaktion noch einer Amplifikationsreaktion, vorzugsweise eine PCR-Reaktion oder Amplifikation von RNA mittels RNA-Polymerasen, unterworfen (Eberwine, J. Biotechniques, 20:584-591 (1996)).In a variant of the method according to the invention becomes the mixture to be analyzed before the sequencing reaction nor an amplification reaction, preferably a PCR reaction or amplification of RNA by RNA polymerases (Eberwine, J. Biotechniques, 20: 584-591 (1996)).

In einer besonders bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens sind die Polynukleinsäuren DNA-Moleküle und die Sequenzierung erfolgt durch Pyrosequenzierung. Hierbei werden als Enzyme in Schritt (b) DNA-Polymerase, Sulfyrase, Luciferase und Apyrase und als Nukleotide dNTPs zugesetzt.In a particularly preferred embodiment the method according to the invention are the polynucleic acids DNA molecules and sequencing is by pyrosequencing. Here are as enzymes in step (b) DNA polymerase, Sulfyrase, luciferase and apyrase and added as nucleotides dNTPs.

In einer weiteren besonders bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens sind die Polynukleinsäuren RNA-Moleküle, und die Sequenzierung erfolgt durch Pyrosequenzierung unter Verwendung einer reversen Transkriptase. Hierbei werden in Schritt (b) als Enzyme reverse Transkriptase statt DNA-Polymerase und als Nukleotide dNTPs zugesetzt.In a further particularly preferred embodiment of the method according to the invention are the polynucleic acids RNA molecules, and sequencing is done by pyrosequencing using a reverse transcriptase. Here, in step (b) as Enzyme reverse transcriptase instead of DNA polymerase and as nucleotides dNTPs added.

In dem erfindungsgemäßen Verfahren kann das zu sequenzierende Polynukleinsäuregemisch (i) aus einer Mixtur von Organismen gewonnen werden, um deren Zusammensetzung qualitativ und quantitativ zu bestimmen, oder (ii) aus einzelnen Organismen oder Geweben gewonnen werden, um die Zusammensetzung der DNA und/oder RNA Nukleinsäurefraktionen qualitativ und quantitativ zu bestimmen.In the method according to the invention may be the polynucleic acid mixture (i) to be sequenced from a mixture be obtained from organisms to their composition qualitatively and to determine quantitatively, or (ii) from individual organisms or tissues are recovered to the composition of the DNA and / or RNA nucleic acid fractions to determine qualitatively and quantitatively.

In einer weiteren besonders bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens wird ribosomale RNA als Sequnzierungsmatrize in dem Polynukleinsäuregemisch verwendet.In a further particularly preferred embodiment of the method according to the invention is ribosomal RNA used as a sequencing template in the polynucleic acid mixture.

In einer weiteren bevorzugten Ausführungsform wird messenger-RNA als Sequenzierungsmatrize in dem Polynukleinsäuregemisch verwendet, wobei

(i) Primer verwendet werden die komplementär zur poly-A Region der mRNA oder oligo-dT Region der korrespondierenden cDNA sind und ein oder mehrere spezifische Nukleotide am 3'-Ende enthalten und/oder
(ii) Primer verwendet werden, die teilweise oder vollständig komplementär zu spezifisch ausgewählten mRNA Sequenzen sind.

In another preferred embodiment, messenger RNA is used as a sequencing template in the polynucleic acid mixture, wherein

(i) primers are used which are complementary to the poly A region of the mRNA or oligo dT region of the corresponding cDNA and contain one or more specific nucleotides at the 3 'end and / or
(ii) using primers that are partially or fully complementary to specifically selected mRNA sequences.

Das erfindungsgemäße Verfahren ist z. B. zur Mikroorganismen-Analyse und Krankheitsdiagnostik geeignet.The inventive method is z. B. suitable for microorganism analysis and disease diagnosis.

Der erfindungsgemäße Kit gemäß Ausführungsform (2) der Erfindung beinhaltet vorzugsweise

(i) geeignete Primer zur Durchführung der Sequenzreaktion und/oder
(ii) geeignete Primer zur Durchführung der Amplifikation und/oder
(iii) geeignete Enzyme zur Durchführung der Sequenzreaktion und/oder
(iv) geeignete Chemikalien zur Durchführung der Sequenzreaktion und/oder
(v) geeignete Kontrollen für die Durchführung der Sequenzreaktion und/oder
(vi) geeignete Kontrollen für die Durchführung der Amplifikation und/oder
(vii) eine Bibliothek der Sequenzprofile der nachzuweisenden Einzelsequenzen und/oder
(viii) ein Computerprogramm zur Durchführung der notwendigen Kalkulationen.

The kit according to the embodiment (2) of the invention preferably contains

(i) suitable primers for carrying out the sequence reaction and / or
(ii) suitable primers for carrying out the amplification and / or
(iii) suitable enzymes for carrying out the sequence reaction and / or
(iv) suitable chemicals for carrying out the sequence reaction and / or
(v) appropriate controls for performing the sequence reaction and / or
(vi) appropriate controls for performing the amplification and / or
(vii) a library of the sequence profiles of the individual sequences to be detected and / or
(viii) a computer program to perform the necessary calculations.

Die Sequenzierung kann durch eine Vielzahl verschiedener Verfahren erfolgen. Das bevorzugte Sequenzierungsverfahren für das Verfahren der vorliegenden Erfindung ist hierbei die Pyrosequenzierung. Wie vorstehend diskutiert, ist Pyrosequenzierung eine Technik, die auf der Tatsache beruht, dass die DNA-Polymerisierungsreaktion von einer Pyrophosphatfreisetzung begleitet ist. Während des Sequenzierens wird ein bestimmtes Nukleotidtriphosphat zu der Mischung aus Matrize, Polymerase und angelagertem Primer gegeben. Falls der Einbau stattfindet wird das Pyrophosphat freigesetzt. Das Pyrophosphat generiert – zusammen mit Luciferin und Luciferase -einen Lichtblitz, der gemessen und quantifiziert werden kann. Die Menge an Licht ist proportional zur Konzentration des Pyrophsophats und demgemäß zu der Konzentration der Matrize.The Sequencing can be done by a variety of different methods. The preferred sequencing method for the method of the present invention Invention here is pyrosequencing. As discussed above, Pyrosequencing is a technique based on the fact that the DNA polymerization reaction accompanied by pyrophosphate release. During the Sequencing results in a particular nucleotide triphosphate to the mixture from template, polymerase and annealed primer. If the Installation takes place, the pyrophosphate is released. The pyrophosphate generated - together with luciferin and luciferase-a flash of light measured and can be quantified. The amount of light is proportional to Concentration of pyrophosphate and accordingly to the concentration of Die.

Die Bestimmung der Menge der einzelnen Sequenzmatrizen des Gemischs erfolgt vorzugsweise durch ein System von linearen Gleichungen, die wie folgt dargestellt werden können:

wobei S_j – die Peakintensität beim j-ten Schritt eines bestimmten Nukleotids ist, k_ji(X) der lineare Koeffizient zwischen Helligkeit und Einbauwahrscheinlichkeit eines bestimmten Nukleotids X beim j-ten Schritt der Sequenzierung für die i-te Sequenzierungsmatrize ist, n_ji(X) die Zahl der verfügbaren Einbauereignisse für das Nukleotid X bei dem j-ten Schritt für die i-te Sequenzierungsmatrize (0, 1, 2, 3 ...) ist, x_i die gesuchte Konzentration der gewünschten i-ten Matrize ist und L die Anzahl der Schritte darstellt. Der Term n_ji(X) ist spezifisch für die Anwendung des Pyrosequenzierungsverfahrens, da hier nacheinander vorkommende gleiche Nukleotide in einem Peak zusammengefasst werden. Bei der Anwendung anderer Sequenzierungsverfahren, die jede einzelne Nukleotidposition repräsentieren (z.B. bei den ddNTP Verfahren) würde dieser Term immer 1 sein und kann dann einfach weggelassen werden.The determination of the quantity of the individual sequence matrices of the mixture is preferably carried out by a system of linear equations, which can be represented as follows:

where S _j - is the peak intensity at the j-th step of a given nucleotide, k _ji (X) is the linear coefficient between brightness and likelihood of incorporation of a particular nucleotide X at the jth step of sequencing for the ith sequencing template, n _ji (X) is the number of available incorporation events for nucleotide X at the jth step for the ith sequencing template ( 0, 1, 2, 3 ...), x _{i is} the sought concentration of the desired ith matrix and L is the number of steps. The term n _ji (X) is specific for the application of the Pyrosequenzierungsverfahrens, since successively occurring same nucleotides are summarized in one peak. When using other sequencing methods that represent each individual nucleotide position (eg, in the ddNTP method), this term would always be 1, and then can simply be omitted.

Dieses Gleichungssystem kann kurz in Matrixform geschrieben werden: N·X = S, wobei N die Matrix von n_ji multipliziert mit k_ji(X) ist, X den Vektor von x_i und S den Vektor der Peakintensitäten darstellt. Dieses System kann analytisch durch die Anwendung konventioneller Methoden gelöst werden (Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T. (1993). Numerical recipes in C: The Art of Scientific Computing. Cambridge University Press; 2nd edition): Durch Multiplizieren von beiden Seiten mit N^T erhält man NT·N·X = NT·S. This system of equations can be written briefly in matrix form: N × X = S, where N is the matrix of n _ji multiplied by k _ji (X), X is the vector of x _i and S is the vector of peak intensities. This system can be solved analytically by the use of conventional methods (Press, WH, Flannery, BP, Teukolsky, SA, Vetterling, WT (1993) Numerical recipes in C: The Art of Scientific Computing, Cambridge University Press, 2nd edition): By multiplying both sides with N ^T one obtains N T · N · X = N T · S.

Es ist bekannt, dass N^T·N, unabhängig vom Wert von N, eine quadratische Matrix ist. Demgemäß wird Multiplizieren der invertierten N^T·N-Matrix eine Einheitsmatrix erzeugen: (NT·N)–1·NT·N·X = (NT·N)–1·NT·S It is known that N ^T · N, regardless of the value of N, is a square matrix. Accordingly, multiplying the inverted N ^T · N matrix will produce a unit matrix: (N T · N) -1 · N T · N · X = (N T · N) -1 · N T · S

Die Matrix (N^T·N)^–1·N^T·N ist eine Einheitsmatrix mit 1 auf seiner Diagonalen und 0 an allen anderen Positionen. Die Lösung ist somit X = (NT·N)–1·NT·Sund die Diagonale von R(N^T·N)^–1 enthält das Quadrat der Standardabweichung jeder Lösung, wobei R = (NXf – S)T·(NXf – S)/(n – P)

X_f: = Theoretische Lösung des Systems
n: = die Anzahl der Reihen von N
P: = Anzahl der Spalten von N.

The matrix (N ^T × N) ^-1 × N ^T × N is a unit matrix with 1 on its diagonal and 0 in all other positions. The solution is thus X = (N T · N) -1 · N T · S and the diagonal of R (N ^T * N) ^-1 contains the square of the standard deviation of each solution, where R = (NX f - S) T · (NX f - S) / (n - P)

X _f: = Theoretical solution of the system
n: = the number of rows of N
P: = Number of columns of N.

Die Zahl von Schritten, die für eine eindeutige Lösung notwendig ist, muss zumindest so gewählt sein, dass die Matrix N nicht singulär ist. Tatsächlich ist es besser, das Gleichungssystem überzudefinieren und sicherzustellen, dass Störungen der gemessenen Intensität nicht die Lösung beeinträchtigen und das Rauschen damit kompensiert wird.The Number of steps for a definite solution is necessary, must at least be chosen so that the matrix N not singular is. Indeed it is better to over-define the equation system and ensure that disorders the measured intensity not the solution impair and the noise is compensated.

Für die Praxis bedeutet dies, dass die Lösung am einfachsten ist, wenn die zu determinierenden Sequenzen vollständig unterschiedlich sind. In diesem Fall entspricht die Anzahl der Sequenzierungsschritte die zuverlässig durchgeführt werden können, der Anzahl der unterschiedlichen Sequenzen die nachgewiesen werden können, abzüglich der ggf. notwendigen Überdefinition des Systems. Sind die Sequenzen partiell ähnlich zueinander, müssen die minimal notwendigen Schritte die zu Ihrer Unterscheidung führen erhöht werden, wobei die Anzahl vorher kalkuliert werden kann (vgl. Beispiel). Aus diesem Zusammenhang ergibt sich, dass das erfindungsgemäße Verfahren besonders in den Fällen gut anwendbar ist, in denen sich Sequenzen stark voneinander unterscheiden. Dabei können sich die Unterschiede sowohl auf Nukleotidaustausche beziehen, wie auch auf Insertionen und Deletionen.For the practice This means that the solution easiest is when the sequences to be determined completely different are. In this case, the number of sequencing steps equals the reliable carried out can be the number of different sequences detected can, less the possibly necessary overdefinition of the system. If the sequences are partially similar to each other, the minimally necessary steps that lead to your distinction being increased, where the number can be calculated in advance (see example). From this context, it follows that the inventive method especially in the cases is well applicable, in which sequences differ greatly from each other. It can the differences relate to both nucleotide exchanges, as well on insertions and deletions.

In der Praxis ist der Koeffizient k_ji(X) unbekannt. Im Idealfall sollten sie alle gleich sein und sollten damit aus dem Gleichungssystem entfernt werden können. Aber die Realität ist komplizierter. Um das Problem von unbekannten Koeffizienten zu lösen, muss man die Pyrogramme für jede einzelne Sequenz aufnehmen, z. B. aus den Klonen oder von künstlich synthetisierten Oligonukleotiden, und sie in einer Bibliothek abspeichern. Ein Pyrogramm einer gegebenen Sequenz ist demgemäß eine Spalte in der Matrix N. Die Lösung X wird dann als Vielfaches der Konzentrationen gefunden, die bei der Aufnahme der Bibliothek der Pyrogramme benutzt wurde. Es ist demgemäß sinnvoll, gleiche Konzentrationen bei der Aufnahme der Bibliotheken zu verwenden. Die Bibliotheken können einmal aufgenommen werden und für alle weiteren Ausnahmen aufbewahrt werden. Bei einer Sequenzierung eines unbekannten Gemischs ist es darüber hinaus angebracht eine bekannte Menge einer Sequenz zuzufügen, die üblicherweise nicht in der Probe vorhanden ist. Selbstverständlich muss das Pyrogramm dieses Standards ebenfalls in der Bibliothek verfügbar sein. Nach Auffindung der Lösung können alle Variablen auf den Standard bezogen werden und die Endkonzentration kann über die bekannte Konzentration des Standards bestimmt werden. Die Verwendung eines Standards verringert die Empfindlichkeit der Lösung durch Veränderung der Instrumentencharakteristik zwischen den verschiedenen Bestimmungen.In practice, the coefficient k _ji (X) is unknown. Ideally, they should all be the same and should be removed from the system of equations. But the reality is more complicated. To solve the problem of unknown coefficients, it is necessary to record the pyrograms for each sequence, e.g. From the clones or artificially synthesized oligonucleotides, and store them in a library. A pyrogram of a given sequence is accordingly a column in matrix N. Solution X is then found to be a multiple of the concentrations used in the uptake of the pyrogram library has been. Accordingly, it makes sense to use equal concentrations in the library uptake. The libraries can be recorded once and kept for all other exceptions. In addition, when sequencing an unknown mixture, it is convenient to add a known amount to a sequence that is not ordinarily present in the sample. Of course, the pyrogram of this standard must also be available in the library. After finding the solution, all variables can be referenced to the standard and the final concentration can be determined by the known concentration of the standard. The use of a standard reduces the sensitivity of the solution by changing the instrument characteristics between the different determinations.

Im Idealfall, d. h. wenn alle k_ji(X) = 1 sind, ist es notwendig, die minimale Zahl von Schritten für das Sequenzieren in Abhängigkeit von der aktuellen Sequenz zu bestimmen. Offensichtlich besitzen die Sequenzen als solche eine bestimmte Menge an Information, was die auszuführende Zahl an Schritten beeinflusst. Aus diesem Grund enthält die Matrix N nur die Zahl von Nukleotiden, die für die Inkubation bei jedem einzelnen Schritt verfügbar sind. Diese für eine vorgegebene Zahl von Schritten geeignete Matrix kann aus den Sequenzen gemäß einem einfachen Algorhithmus erstellt werden. Nachdem die Matrix erstellt wurde, muss deren Singularität bestimmt werden, und falls sie singulär ist, werden weitere Schritte hinzugefügt, die Matrix erneut erzeugt, erneut getestet und so lange fortgeführt bis keine singuläre Matrix mehr entsteht.Ideally, ie when all k _ji (X) = 1, it is necessary to determine the minimum number of steps for sequencing depending on the current sequence. Obviously, the sequences as such have a certain amount of information, which affects the number of steps to be performed. For this reason, the matrix N contains only the number of nucleotides available for incubation at each step. This matrix suitable for a given number of steps can be constructed from the sequences according to a simple algorithm. Once the matrix has been created, its singularity must be determined, and if it is singular, additional steps are added, the matrix is recreated, retested and continued until no singular matrix is created.

Bei der Multiplex Sequenzierung eines komplexen Gemischs an Nukleinsäuren entsteht ein komplexes Muster an überlagerten Sequenzen. Dieses Muster wird verglichen mit dem bereits bekannten Muster einzelner Nukleinsäuren, wie sie für eine Vielzahl von Organismen in verschiedenen Datenbanken zugänglich sind. Die Erfindung wird anhand der nachfolgende Beispiele näher erläutert, die jedoch das erfindungsgemäße Verfahren nicht einschränken.at the multiplex sequencing of a complex mixture of nucleic acids arises a complex pattern of superimposed Sequences. This pattern is compared with the already known one Patterns of individual nucleic acids, as for a variety of organisms are accessible in different databases. The invention will be explained in more detail with reference to the following examples which however, the inventive method do not restrict.

BeispieleExamples

Beispiel 1example 1

Das erfindungsgemäße Verfahren wurde anhand von Mischungen von PCR-Produkten von klonierten rRNA-Genen gezeigt. Hierzu wurde zunächst eine Bibliothek von Pyrogrammen für sieben rRNA-Sequenzen aufgenommen (die Abkürzungen für die Sequenzen sind die folgenden: A = Alge01, T = Tardig3, O = Ostrac7, H = Harpac13, C = Cyclop13, E = Epheme1, N4 = Nematd40). Die Sequenzen wurden ursprünglich in der Doktorarbeit von Melanie Markmann (Universität München, 2000) beschrieben und die Bezeichnungen richten sich nach den Bezeichnungen in der Doktorarbeit von Markmann. Die verwendeten Sequenzen sind in SEQ ID NOs:1 bis 7 dargestellt. Von den Sequenzen liegen klonierte Fragmente vor, die in die Multiple Cloning Site des Vektors pZERO-2 inseriert sind. Aus der Plasmid DNA dieser Klone wurden die zu sequenzierenden Fragmente mittels PCR amplifiziert. Die entsprechende PCR Amplifikation wurde in 30 μl Volumen mit 37 Zyklen durchgeführt. Konzentration des Templats: 0,67 ng/μl. Der Primer 5'- GAC-CCG-TCT-TGA-AAC-ACG-G-3' (SEQ ID NO:8) wurde als Vorwärtsprimer und der 5'-biotinylierte Primer 5'-ATC-GAT-TTG-CAC-GTC-AGA-A-3' (SEQ ID NO:9) wurde als reverser Primer verwendet.The inventive method was based on mixtures of PCR products from cloned rRNA genes shown. This was initially recorded a library of pyrograms for seven rRNA sequences (abbreviations for the Sequences are the following: A = Alge01, T = Tardig3, O = Ostrac7, H = Harpac13, C = Cyclop13, E = Epheme1, N4 = Nematd40). The sequences were originally in the PhD thesis of Melanie Markmann (University of Munich, 2000) described and the names depend on the names in the doctoral thesis of Markmann. The sequences used are in SEQ ID NOs: 1 to 7. Of the sequences are cloned Fragments present in the multiple cloning site of vector pZERO-2 are advertised. From the plasmid DNA of these clones were to be sequenced Fragments amplified by PCR. The corresponding PCR amplification was in 30 ul Volume performed with 37 cycles. Concentration of the template: 0.67 ng / μl. The primer 5'-GAC-CCG-TCT-TGA-AAC-ACG-G-3 '(SEQ ID NO: 8) was as forward primer and the 5'-biotinylated Primer 5'-ATC-GAT-TTG-CAC-GTC-AGA-A-3 '(SEQ ID NO: 9) used as reverse primer.

Pyrosequenzierung wurde mit der Sequenz 5'-GAA-ACA-CGG-ACC-AAG-GAG-T-3' (SEQ ID NO:10) als Sequenzierprimer durchgeführt, wobei das Standardprotokoll zum Einsatz kam. Die 1 zeigt die Profile der Pyrogramme.Pyrosequencing was performed with the sequence 5'-GAA-ACA-CGG-ACC-AAG-GAG-T-3 '(SEQ ID NO: 10) as a sequencing primer using the standard protocol. The 1 shows the profiles of pyrograms.

Die ursprünglichen PCR-Produkte wurden verworfen, um ein reales Experiment zu simulieren, in dem die Sequenzierungsprofil-Bibliothek vorher zusammengestellt wurde. Eine weitere PCR-Reaktion wurde mit denselben Fragmenten durchgeführt. Die Agarosegelelektrophorese zeigte, dass die Konzentration der Produkte identisch war und etwa 40 ng/μl betrug. Die PCR-Produkte wurden verwendet um drei Gemische herzustellen.The original PCR products were discarded to simulate a real experiment in which the sequencing profile library has been previously compiled has been. Another PCR reaction was done with the same fragments carried out. Agarose gel electrophoresis showed that the concentration of Products was identical and was about 40 ng / μl. The PCR products were used to make three mixtures.

Die Tabelle 1 zeigt den Vergleich der Gemische in μl der PCR-Produkte. Die Mischung wurde der Pyrosequenzierung unterworfen, und die erhaltenen Pyrogramme sind in 2 gezeigt. Diese Pyrogramme wurden analysiert gemäß dem in der Beschreibung detailliert dargelegten Verfahren. Da die verwendeten Sequenzen an einigen Stellen identisch sind, bzw. sich überlappen, wurde die minimale Anzahl der notwendigen Sequenzschritte kalkuliert, die den unabhängigen Nachweis aller sieben Sequenzen erlauben würde. Dazu wurde ein Simulationsalgorithmus programmiert der eine virtuelle Pyrosequenzierung durchführt in der alle Einbauschritte exakt proportional sind. Die daraus resultierenden virtuellen Pyrogramme werden als Spalten in die ideale Matrix N eingesetzt. Nach jedem Sequenzierungsschritt wird die Singularität der Matrix geprüft. Eine minimal notwendige Anzahl an Schritten ist dann gefunden, wenn die Matrix nicht mehr singulär ist. Auf diese Art wurde die minimale Zahl der Schritte für die gewählten Sequenzen als 34 bestimmt. Sowohl die Pyrogramme der Gemische, als auch die Profile aus der Bibliothek wurden nach dem 50. Schritt abgeschnitten und stellen somit 16 weitere Gleichungen zur Überdefinition des Gleichungssystems zur Verfügung.Table 1 shows the comparison of the mixtures in μl of the PCR products. The mixture was subjected to pyrosequencing and the resulting pyrograms are in 2 shown. These pyrograms were analyzed according to the procedure detailed in the description. Since the sequences used are identical or overlapping in some places, the minimum number of sequence steps required was calculated, which would allow the independent detection of all seven sequences. For this purpose, a simulation algorithm was programmed which performs a virtual Pyrosequencing in which all installation steps are exactly proportional. The resulting virtual pyrograms are used as columns in the ideal matrix N. After each sequencing step, the singularity of the matrix is checked. A minimum necessary number of steps is found when the matrix is no longer singular. In this manner For example, the minimum number of steps for the selected sequences was determined to be 34. Both the pyrograms of the mixtures and the library profiles were cut off after the 50th step, thus providing 16 more equations for overdimensioning the system of equations.

Der Vektor S aus der oben dargelegten Gleichung entspricht den Peak Intensitäten jedes Schritts in der Sequenzierungsreaktion der Mixe. Diese Intensitäten wurden numerisch als Tabelle erfasst und als Spaltenvektor in die Matrix eingesetzt. Die weiteren Kalkulationen entsprechend der Gleichung wurden mit Hilfe des Matlab Software Paketes (The MathWorks Inc., Natick USA, Version 6.1.0.450) durchgeführt. Das System wurde gelöst und wurde in Richtung auf „bekannte" Konstellationen des Standards gelöst (in diesem Falle die Spezies T).Of the Vector S from the equation set forth above corresponds to the peak intensities each step in the sequencing reaction of the mixes. These intensities were numerically recorded as a table and as a column vector in the matrix used. The further calculations according to the equation were using the Matlab software package (The MathWorks Inc., Natick USA, version 6.1.0.450). The system was solved and became towards "known" constellations solved by the standard (in this case the species T).

Die Lösungen und deren Standardabweichungen sind in Tabelle 2 gezeigt.The solutions and their standard deviations are shown in Table 2.

Die Tabelle 3 zeigt die daraus errechneten Konzentrationswerte. Im Gemisch 1 ist eine gute Erkennung der Sequenzkomponenten ersichtlich. Die abwesenden Spezies zeigen Werte von –1 bis 2, was durch Hintergrundrauschen bedingt ist. Tatsächlich sind die Standardabweichungen für diese Lösungen höher als die Lösung selbst, wohingegen jene für die Lösung der vorliegenden Sequenzkomponenten kleiner waren als die bestimmten Werte. Im Analogschluss ist die geeignete Erkennung der Zusammensetzung der anderen Gemische (Mischung 2 und Mischung 3) möglich (s. Tab. 3). Die relativ größte Abweichung von der aktuellen Konstellation tritt in dem komplexesten Gemisch (Gemisch 3) auf.The Table 3 shows the calculated concentration values. In the mixture Figure 1 shows a good recognition of the sequence components. The absent species show values from -1 to 2, which is due to background noise is conditional. Indeed are the standard deviations for these solutions higher than the solution itself, whereas those for the solution the sequence components present were smaller than those determined Values. In analogy, the appropriate recognition of the composition the other mixtures (mixture 2 and mixture 3) possible (s. Tab. 3). The relatively largest deviation from the current constellation occurs in the most complex mixture (Mixture 3).

Tabelle 1: Zusammensetzung der Gemische

Table 1: Composition of the mixtures

Tabelle 2: Lösungen und Stabdardabweichungen

Table 2: Solutions and StabDard Deviations

Tabelle 3: Vergleich zweier gefundener und vorgegebener Verhältnisse

Table 3: Comparison of two found and given ratios

Beispiel 2Example 2

Die zu sequenzierende Polynukleinsäure Mischung kann auch zunächst aus RNA statt einzelsträngiger DNA bestehen. In diesem Fall kommen zwei alternative Vorgehensweisen in Betracht. Entweder man verwendet statt der DNA Polymerase eine reverse Transkriptase als Sequenzierungsenzym, oder man konvertiert die RNA mittels einer reverse Transkriptase Reaktion in eine einzelsträngige DNA, die dann wieder als Matrize in einer DNA Sequenzierung verwendet werden kann.The polynucleic acid to be sequenced Mixture can also be first from RNA instead of single-stranded DNA exist. In this case, there are two alternative approaches into consideration. Either one uses instead of the DNA polymerase one reverse transcriptase as a sequencing enzyme, or one converts the RNA by means of a reverse transcriptase reaction into a single-stranded DNA, which then used again as a template in a DNA sequencing can be.

Die Verwendung von RNA zur direkten Sequenzierung kann eine vorangehende PCR Reaktion überflüssig machen, wenn die zu sequenzierende RNA bereits in ausreichender Menge in dem Polynukleinsäuregemisch vorliegt. Dies gilt insbesondere für ribosomale RNA (rRNA). Diese liegt in Zellen bereits in hoher Kopienzahl vor und macht in der Regel den Großteil der Gesamt-RNA aus. Damit bietet sie sich insbesondere für eine Bestimmung einer Mixtur von Mikroorganismen oder eukaryontischen Organismen an. Für die Analyse eines solchen Gemisches ist es ausreichend die RNA statt der DNA zu isolieren. Durch die Verwendung von rRNA spezifischen Primern wird es möglich diese direkt als Sequenzierungsmatrize zu benutzen, ohne vorangehende PCR Amplifikation. Dadurch können die bekannten Probleme der PCR Amplifikation vermieden werden, die insbesondere bei der Amplifikation von Sequenzmischungen nicht quantitativ ist, oder sogar zur Rekombination von unterschiedlichen Fragmenten führen kann.The Use of RNA for direct sequencing may be a previous one Make PCR reaction superfluous, if the RNA to be sequenced already in sufficient quantity in the polynucleic acid mixture is present. This is especially true for ribosomal RNA (rRNA). These is already present in cells in high copy number and makes in the Usually the majority of total RNA. This makes it particularly suitable for a determination a mixture of microorganisms or eukaryotic organisms at. For the analysis of such a mixture is sufficient to place the RNA to isolate the DNA. By the use of rRNA specific It will be possible to do primers to use these directly as a sequencing matrix, without preceding ones PCR amplification. Thereby can the known problems of PCR amplification are avoided especially in the amplification of sequence mixtures is not quantitative is, or even for the recombination of different fragments to lead can.

Die Durchführung eines Multiplex-Sequenzierunsexperimentes auf der Basis von rRNA entspricht im wesentlichen dem Vorgehen in dem oben geschilderten Beispiel. Dazu ist es lediglich notwendig die rRNA mittels eines geeigneten Primers und reverser Transkriptase in eine einzelsträngige DNA-Matrize umzuschreiben. Das weitere Vorgehen entspricht dann exakt dem Beispiel. Alternativ kann das Pyrosequencing Gemisch daraufhin optimiert werden eine reverse Transkriptase als Sequenzierungsenzym zu verwenden. Diese Option ist bisher aber nicht kommerziell erhältlich.The execution a multiplex sequencing experiment based on rRNA essentially corresponds to the procedure in the above Example. For this it is only necessary the rRNA by means of a suitable primer and reverse transcriptase into a single-stranded DNA template rewrite. The further procedure then corresponds exactly to the example. Alternatively, the pyrosequencing mixture can then be optimized to use a reverse transcriptase as a sequencing enzyme. This option is not yet commercially available.

Das erfindungsgemäße Verfahren kann auch mit anderen üblichen Sequenzierungsverfahren durchgeführt werden. Dafür kommt vor allem das Cycle-Sequencing in Frage, in dem eine thermostabile DNA-Polymerase als Sequenzierungsenzym genutzt wird, die mittels mehrerer Sequenzierungs-/Denaturierungs-Zyklen gleichzeitig zu einer Verstärkung des Signals führt. Damit eröffnet sich auch der Einsatz für vergleichsweise geringe Sequenzierungsmatrizenmengen. So wird es möglich komplexe RNA Gemische aus einem Gewebe eines einzelnen Organismus parallel zu sequenzieren.The inventive method can also with other usual Sequencing method performed become. Therefore comes especially the cycle sequencing in question, in which a thermostable DNA polymerase as sequencing enzyme used by multiple sequencing / denaturing cycles simultaneously to a reinforcement the signal leads. With it opened also the use for relatively low sequencing template amounts. It will be like this possible complex RNA mixtures from a tissue of a single organism to sequence in parallel.

Das Vorgehen ist dabei wie folgt: aus dem Gewebe wird die gesamte RNA isoliert, also sowohl die ribosomale RNA, wie auch die messenger-RNA (mRNA). Aus der mRNA wird danach mittels reverser Transkriptase und oligo-dT Primern eine einzelsträngige cDNA hergestellt. Diese wird einer Cycle-Sequencing Reaktion unterworfen, wobei in einer Ausführungsart ein einzelner Anker-Primer verwendet wird, der im wesentlichen komplementär zur poly-A Sequenz der mRNA ist (bzw. komplementär der poly-T Sequenz der korrespondierenden cDNA), aber noch ein oder mehrere zusätzliche spezifische Nukleotide am 3'-Ende enthält, die dazu führen dass nur ein Subset aller RNAs erkannt und sequenziert wird. In einer alternativen Ausführungsart wird ein Gemisch ausgewählter Primer verwendet. Die Primer sind dabei spezifisch für bestimmte Gene, deren Expression in dem betreffenden Gewebe parallel getestet werden soll.The The procedure is as follows: the tissue becomes the entire RNA isolated, so both the ribosomal RNA, as well as the messenger RNA (MRNA). From the mRNA is then by reverse transcriptase and oligo-dT primers made a single-stranded cDNA. These is subjected to a cycle-sequencing reaction, wherein in one embodiment a single anchor primer is used which is substantially complementary to the poly-A Sequence of mRNA is (or complementary to the poly-T sequence of the corresponding cDNA), but one or more additional specific nucleotides at the 3'-end contains, the to to lead that only a subset of all RNAs is recognized and sequenced. In an alternative embodiment a mixture of selected ones Primer used. The primers are specific for certain Genes whose expression was tested in parallel in the tissue in question shall be.

Die direkte Sequenzierung von mRNA Gemischen aus Geweben kann insbesondere in der Krankheits-Diagnose zum Einsatz kommen, wenn es darum geht spezifische Veränderung der Genexpression im kranken Gewebe zu entdecken und zu quantifizieren. Für diesen Anwendungsbereich werden gegenwärtig vor allem Micoarray Verfahren angewendet. Wichtig ist hierbei, dass es in der Regel ausreicht die Expression einiger weniger bis maximal einiger Dutzend Gene spezifisch zu bestimmen. Damit kann die parallele Sequenzierung eingesetzt werden, da in diesem Fall die Anzahl der in einer Reaktion sequenzierbaren Nukleotide der Anzahl der theoretisch nachweisbaren im Gemisch vorhandenen unterschiedlichen Matrizen entspricht. Typische Cycle-Sequencing Reaktionen erlauben die sichere Bestimmung von bis zu 500 Nukleotiden. Die parallele Sequenzierung hat in diesem Fall wesentliche Kosten- und Zeitvorteile gegenüber den gängigen Microarray Verfahren. Um die Sequenzierungsreaktion stabiler zu machen kann, wie es auch bei Microarray Verfahren üblich ist, eine Voramplifikation der mRNA vorgenommen werden.The In particular, direct sequencing of mRNA mixtures of tissues can be done to be used in disease diagnosis when it comes to it specific change to detect and quantify gene expression in diseased tissue. For this Scope are becoming current especially Micoarray method applied. It is important that It usually suffices to express a few to a maximum Specify a few dozen genes specifically. This can be the parallel Sequencing can be used, since in this case the number of in a reaction sequenceable nucleotides of the number of theoretical detectable in the mixture of different matrices equivalent. Typical cycle-sequencing reactions allow for safe determination of up to 500 nucleotides. The parallel sequencing has in this Case significant cost and time advantages over the current microarray method. To make the sequencing reaction more stable, as can it common in microarray procedures is to be made a pre-amplification of the mRNA.

Es folgt ein Sequenzprotokoll nach WIPO St. 25.It follows a sequence listing according to WIPO St. 25. Dieses kann von der amtlichen Veröffentlichungsplattform des DPMA heruntergeladen werden.This can from the official publication platform downloaded from the DPMA.

Claims

Method for the analysis of complex mixtures of known polynucleic acids, comprising the following steps: (a) providing a sequencing reaction mixture, comprising the complex mixture of known polynucleic acids to be analyzed, (B) inflict one or more enzymes and at least one primer to the mixture (a) having at least one sequence section leading to a or is complementary to several sequences of the known polynucleic acids, (C) Sequencing of the mixture (a), simultaneously or after completion the sequencing reaction (d) a common signal spectrum for all sequencing products is recorded. (e) a comparison / comparison of the recorded Signal spectrum with the sequences and / or the signal spectra of the known Polynucleotide sequences occur and (f) the individual polynucleic acids of the Mixture can be identified and quantified.

Method according to claim 1, where (i) the complex mixture of polynucleic acids at least two different polynucleic acids, preferably at least two, more preferably at least five, six, distinguish seven, eight, nine, or ten sequence positions, either in the form of a nucleotide exchange or by insertion or Deletion of one or more nucleotides, and / or (ii) for the individual sequences to be detected have been created individual sequence profiles are, and / or (iii) the complex nucleic acid mixtures DNA molecules, RNA molecules, mixtures or derivatives thereof, and / or (iv) the enzymes selected are from DNA polymerases, reverse transcriptases, and auxiliary enzymes for quantification the detection reaction, and / or (v) the sequencing reaction mixture continue to add standards of known single sequences, for the a library of individual sequence profiles has been created, and / or (Vi) in step (b) further added nucleotides or nucleotide derivatives be, and / or (vii) the sequencing reaction mixture is polynucleic acids, directly from polynucleic acid containing organisms or tissues are isolated.

Method according to claim 1 or 2, wherein the sequencing by a method selected from Pyrosequencing, sequencing with dideoxy nucleotides of single-stranded matrices, Cycle sequencing with dideoxy nucleotides single or double stranded Matrices, or other enzymatic or chemical sequencing methods carried out whereby the sequencing reactions become reproducibly quantifiable Detecting the individual sequenced nucleotide positions leads.

Method according to one or more of the claims 1 to 3, wherein the at least one primer (i) a length of has at least 12, preferably 18 to 25 nucleotides and (Ii) has one or more sequence sections that are complementary to sections from known polynucleic acids are.

Method according to one or more of claims 1 to 4, wherein the mixture to be analyzed before the sequencing reaction nor an amplification reaction, preferably a PCR reaction or amplification of RNA by means of RNA polymerases, is subjected.

Method according to one or more of the claims 1 to 5, wherein the polynucleic acids RNA molecules and sequencing by pyrosequencing using a reverse transcriptase occurs.

Method according to one or more of the claims 1 to 5, wherein the polynucleic acids DNA molecules and sequencing is by pyrosequencing.

Method according to one or more of the claims 1 to 7, wherein the polynucleic acid mixture to be sequenced from a mixture is derived from organisms to their composition qualitatively and to determine quantitatively.

Method according to one or more of the claims 1 to 7, wherein the polynucleic acid mixture to be sequenced is derived from individual organisms or tissues to the composition the DNA and / or RNA nucleic acid fractions to determine qualitatively and quantitatively.

Method according to one or more of the claims 1 to 9, in which ribosomal RNA as a sequencing template in the Polynukleinsäuregemisch is used.

Method according to one or more of the claims 1 to 9, in which messenger RNA as a sequencing template in the polynucleic acid mixture is used, where (i) primers complementary to poly-A can be used Region of the mRNA or oligo-dT region of the cDNA are and one or contain several specific nucleotides at the 3'-end, and / or (Ii) Primers are used which are partially or fully complementary to specific chosen mRNA sequences are.

Kit for execution the method according to a or more of the claims 1 to 11, in particular containing (i) suitable primers for execution the sequence reaction and / or (ii) suitable primers for carrying out the Amplification and / or (iii) suitable enzymes for carrying out the Sequence reaction and / or (iv) suitable chemicals for the implementation of Sequence reaction and / or (v) appropriate controls for the implementation of the Sequence reaction and / or (vi) appropriate controls for the implementation of the Amplification and / or (vii) a library of sequence profiles the individual sequences to be detected and / or (viii) a computer program to carry out the necessary calculations.