DE102010061280A1

DE102010061280A1 - Method for technically realizable restructuring of data, involves storing contents of two components of tuples in memory as nodes of trees to be built, respectively, and outputting data structure formed from trees

Info

Publication number: DE102010061280A1
Application number: DE102010061280A
Authority: DE
Inventors: Anmelder Gleich
Original assignee: Individual
Current assignee: Individual
Priority date: 2010-12-16
Filing date: 2010-12-16
Publication date: 2012-06-21

Abstract

The method involves reading tuples of data and storing contents of two components of tuples in a memory as nodes of trees to be built, respectively, where the contents are represented by one of nodes of the tree. A data structure formed from the trees is output. The data is encoded in an XML document. The components of the tuples are formed by fields of the tuples. Modified tuples are formed for each date of lists of a third component. A command provided in a query language is provided based on a restructuring operation stroke. An independent claim is also included for a computer program with program codes for executing steps of a method for technically realizable restructuring of data.

Description

Die vorliegende Erfindung betrifft ein Verfahren zur technisch realisierbaren Umstrukturierung von Daten, welche in Tupeln mit mehreren Komponenten angeordnet sind. Das erfindungsgemäße Verfahren eignet sich insbesondere zur Umstrukturierung von Daten, welche in XML-Dokumenten kodiert sind. Die Erfindung betrifft außerdem ein Computerprogramm, mit welchem dieses Verfahren auf einem herkömmlichen Computer ausführbar ist.The present invention relates to a method for the technically feasible restructuring of data, which are arranged in tuples with a plurality of components. The inventive method is particularly suitable for the restructuring of data which are encoded in XML documents. The invention also relates to a computer program with which this method can be executed on a conventional computer.

Die DE 101 57 996 B4 zeigt ein Verfahren zur adaptiven Anfrageevaluierung von auf XML basierenden Katalogen, bei welchem eine für einen globalen Katalog vorgegebene XPath-Anfrage auf lokale Kataloge angewendet wird. Jede Subanfrage der XPath-Anfrage wird auf den lokalen Katalogen einer oder mehrerer Transformationen unterworfen.The DE 101 57 996 B4 shows an adaptive query evaluation method of XML-based catalogs in which an XPath request given for a global catalog is applied to local catalogs. Each subquery of the XPath request is subjected to one or more transformations on the local catalogs.

Aus der DE 100 47 338 C2 ist ein Verfahren zur Datenkompression von strukturierten Dokumenten bekannt, welches von einem strukturierten Dokument in einer XML-Textform ausgeht. Bei diesem Verfahren wird eine aus mindestens einem strukturierten Element bestehende Struktur eines Schemas aufgebaut. Auf der Basis der dem Schema zugrunde liegenden Struktur werden Anweisungen mit relativen Adressen derart ermittelt, dass eine durch das strukturierte Dokument beschriebene Datenstruktur entsteht. Die Anweisungen mit den relativen Adressen stellen das strukturierte Dokument in komprimierter Form dar.From the DE 100 47 338 C2 a method for data compression of structured documents is known which emanates from a structured document in an XML text form. In this method, a structure of a scheme consisting of at least one structured element is built up. Based on the structure underlying the schema, relative address instructions are determined so as to create a data structure described by the structured document. The instructions with the relative addresses represent the structured document in compressed form.

Die DE 197 43 266 C1 zeigt ein Verfahren zum Hinzufügen bzw. Entfernen einer Adresse in einem teilbesetzten nicht-balancierten Suchbaum. Mit jedem gültigen Eintrag in dem Suchbaum sind zwei Zeiger abspeicherbar, welche jeweils auf einen gültigen Eintrag einer niedrigeren Ebene verweisen. Dabei wird unter demjenigen Zeiger des zu entfernenden Eintrages, welcher auf den Eintrag der nächst tieferen Ebene mit dem höheren Wert verweist, derjenige Eintrag einer tieferen Ebene aufgesucht, welcher den niedrigsten Wert hat. Dieser Eintrag wird auf die Position des zu entfernenden Eintrages gesetzt und übernimmt die Zeiger des entfernten Eintrages.The DE 197 43 266 C1 shows a method for adding or removing an address in a partially populated unbalanced search tree. With each valid entry in the search tree, two pointers are stored, each pointing to a valid entry of a lower level. In this case, under that pointer of the entry to be removed which refers to the entry of the next lower level with the higher value, that entry of a lower level which has the lowest value is visited. This entry is set to the position of the entry to be removed and takes over the pointers of the removed entry.

Die DE 197 43 267 C1 zeigt ein Verfahren zum Aufsuchen einer Adresse in einem teilbesetzten, nicht-balancierten binären Baum. Bei diesem Verfahren wird zu einem Eintrag in die nächst tiefere Ebene des Baumes ein Vergleich der angesteuerten Einträge mit der gesuchten Adresse auf Übereinstimmung durchgeführt.The DE 197 43 267 C1 shows a method for finding an address in a partially occupied, unbalanced binary tree. In this method, a comparison to the entry in the next lower level of the tree, a comparison of the driven entries with the searched address to match is performed.

In dem Artikel von Benecke, K. und Li, X.: „A Restructuring Operation for XML Documents” in Technical Report, Otto-von-Guericke-Universität Magdeburg, Fakultät für Informatik, Nr. FIN-009-2009, 19. Mai 2009 , wird eine Umstrukturierungsoperation für XML-Dokumente vorgestellt. Bei der Umstrukturierungsoperation mit dem Namen stroke handelt es sich um einen Befehl der Abfragesprache OttoQL. Diese Umstrukturierungsoperation erlaubt durch eine einfache Angabe eines Zielschemas die Abwandlung einer Hierarchie innerhalb hierarchisch strukturierter XML-Dokumente. Beispielsweise kann eine erste Gruppe von Daten einer zweiten Gruppe von Daten untergeordnet werden, wenn in der Quellstruktur die erste Gruppe von Daten der zweiten Gruppe von Daten übergeordnet ist. Hierfür werden Listen im Sinn funktionaler Sprachen erzeugt. Ein Nachteil dieser Umstrukturierungsoperation besteht darin, dass beispielsweise das Einfügen eines Elementes Vergleiche entsprechend der Anzahl der Listeneinträge erfordert, sodass bei Listen mit hunderten oder tausenden Einträgen die Umstrukturierung sehr ineffizient ist. Bei großen Datenmengen sind derartige Umstrukturierungen praktisch nicht mehr durchführbar, da sie die Leistungsfähigkeit herkömmlicher Computer schnell übersteigen. Der genannte Artikel wird als nächstliegender Stand der Technik für die Erfindung angesehen. Hinsichtlich der dort beschriebenen Umstrukturierungsverfahren wird er ausdrücklich in die Offenbarung der vorliegenden Patentanmeldung einbezogen, um auf die detaillierte Beschreibung des vorbekannten Verfahrens verzichten zu können.In the article of Benecke, K. and Li, X .: "A Restructuring Operation for XML Documents" in Technical Report, Otto von Guericke University Magdeburg, Faculty of Computer Science, No. FIN-009-2009, May 19, 2009 , a restructuring operation for XML documents is presented. The restructure operation named stroke is a command of the query language OttoQL. This restructuring operation allows the modification of a hierarchy within hierarchically structured XML documents by simply specifying a target schema. For example, a first group of data may be subordinated to a second group of data if in the source structure the first group of data is the parent of the second group of data. For this purpose, lists are created in the sense of functional languages. A disadvantage of this restructuring operation is that, for example, inserting an element requires comparisons according to the number of list entries, so for lists with hundreds or thousands of entries, restructuring is very inefficient. With large amounts of data, such restructurings are virtually impracticable because they quickly exceed the performance of conventional computers. The cited article is considered to be the closest prior art to the invention. With regard to the restructuring process described therein, it is expressly incorporated in the disclosure of the present patent application in order to be able to dispense with the detailed description of the previously known method.

Die Aufgabe der vorliegenden Erfindung besteht ausgehend vom zuvor genannten Stand der Technik darin, ein verbessertes Umstrukturierungsverfahren, insbesondere ein Verfahren zur mit herkömmlichen technischen Mitteln realisierbaren Umstrukturierung von großen Mengen hierarchisch strukturierter Daten bereitzustellen.The object of the present invention, based on the aforementioned prior art, is to provide an improved restructuring process, in particular a process for the restructuring of large amounts of hierarchically structured data that can be realized with conventional technical means.

Die genannte Aufgabe wird durch ein Verfahren gemäß dem beigefügten Anspruch 1 gelöst. Vorzugsweise wird das Verfahren als Computerprogramm implementiert, welches durch eine herkömmliche Hardware, z. B. einen Personalcomputer ausführbar ist.The stated object is achieved by a method according to the appended claim 1. Preferably, the method is implemented as a computer program, which by a conventional hardware, for. B. is a personal computer executable.

Das erfindungsgemäße Verfahren dient zur technisch realisierbaren Umstrukturierung von hierarchisch strukturierten Daten, welche in Tupeln mit mehreren Komponenten angeordnet sind. Die Tupel stellen geordnete Wertesammlungen dar und sind bevorzugt gleichartig strukturiert, sodass sie eine gleiche Anzahl an Komponenten aufweisen. Bei den Komponenten kann es sich beispielsweise um Felder als Attribute der Tupel handeln. Die Komponenten können aber auch durch Felder gebildet sein, welche sich innerhalb von Subtupeln der Tupel befinden. Die Daten liegen bevorzugt als XML-Datei vor, welche auf einem Datenträger abgespeichert ist. Die Daten können aber auch als XML-Datenstrom übertragen werden. Die Umstrukturierung erfolgt gemäß einer Umstrukturierungsanfrage, durch welche Inhalte der j-ten Komponenten der Tupel Inhalten der i-ten Komponenten unterzuordnen sind. Die Umstrukturierungsanfrage liegt bevorzugt als Befehl vor, welcher in einer Abfragesprache für XML-Datenbanken formuliert ist, beispielsweise in OttoQL. Die Variablen i und j definieren jeweils eine bestimmte Komponente innerhalb der Tupel, beispielsweise immer die erste und die zweite Komponente innerhalb der Tupel. Alle j-ten Komponenten der Tupel sind bevorzugt vom gleichen Typ. Ebenso sind bevorzugt alle i-ten Komponenten der Tupel vom gleichen Typ. In dem Unterordnen ist insbesondere ein Sortieren der Daten zu sehen, durch welches jeweils alle diejenigen j-ten Komponenten zusammengefasst werden, welche in den sie umfassenden Tupeln die gleiche i-te Komponente aufweisen.The inventive method is used for technically feasible restructuring of hierarchically structured data, which are arranged in tuples with multiple components. Make the tuples ordered value collections and are preferably structured similar, so that they have an equal number of components. The components may, for example, be fields as attributes of the tuples. However, the components can also be formed by fields which are located within sub-tuples of the tuples. The data are preferably present as an XML file which is stored on a data carrier. The data can also be transmitted as an XML data stream. The restructuring is performed in accordance with a restructuring request by which contents of the jth components of the tuple contents of the i-th components are to be subordinated. The restructuring request is preferably a command formulated in a query language for XML databases, such as OttoQL. The variables i and j each define a particular component within the tuple, for example, always the first and second components within the tuple. All jth components of the tuples are preferably of the same type. Likewise, preferably all i-th components of the tuples are of the same type. In the subordinate, a sorting of the data is to be seen, by which in each case all those j-th components are combined which have the same i-th component in the tuples comprising them.

Das erfindungsgemäße Verfahren umfasst mehrere Schritte, welche in einem technischen System, insbesondere in einer Datenverarbeitungsanlage durchzuführen sind. In einem der Schritte werden die Inhalte der i-ten Komponenten der Tupel in Knoten eines aufzubauenden ersten Baumes abgelegt, vorzugsweise durch einen elektronischen, optischen oder magnetischen Schreibvorgang. Das Ablegen dieser Inhalte erfolgt bevorzugt dadurch, dass nacheinander die Inhalte der i-ten Komponenten aller Tupel in einem Speicher, insbesondere in einem flüchtigen Datenspeicher abgelegt werden. Bei dem ersten aufzubauenden Baum handelt es sich um einen Baum im Sinne der Graphentheorie, welcher bevorzugt balanciert ausgebildet ist. Das Ablegen der Inhalte der i-ten Komponenten der Tupel erfolgt derart, dass gleiche Inhalte der i-ten Komponenten durch jeweils genau einen der Knoten des ersten Baumes repräsentiert werden, sodass diejenigen Tupel mit inhaltsgleichen i-ten Komponenten nur jeweils genau einen Knoten des ersten Baumes erzeugen. In einem weiteren Schritt des erfindungsgemäßen Verfahrens werden die Inhalte der j-ten Komponenten der Tupel in Knoten weiterer aufzubauender Bäume abgelegt. Das Ablegen der Inhalt der j-ten Komponenten der Tupel erfolgt bevorzugt derart, dass nacheinander die Inhalte der j-ten Komponenten aller Tupel in einem Speicher, insbesondere in einem flüchtigen Datenspeicher abgelegt werden. Auch bei den weiteren Bäumen handelt es sich um Bäume im Sinne der Graphentheorie, welche bevorzugt balanciert ausgebildet sind. Die j-ten Komponenten derjenigen Tupeln mit einer inhaltsgleichen i-ten Komponente werden in jeweils einem einzigen der weiteren Bäume abgelegt. Folglich enthält jeder der weiteren Bäume jeweils die Inhalte der j-ten Komponenten aller derjenigen Tupel, deren i-ten Komponenten inhaltsgleich sind. Die weiteren Bäume sind jeweils mit demjenigen Knoten des ersten Baumes assoziiert, welcher den Inhalt der jeweiligen i-ten Komponente repräsentiert, d. h. ihn ebenfalls beinhaltet. Folglich sind die j-ten Komponenten aller derjenigen Tupel mit einer inhaltsgleichen i-ten Komponente innerhalb eines der weiteren Bäume abgespeichert, wobei dieser Baum mit demjenigen Knoten des ersten Baumes assoziiert ist, welcher den Inhalt der inhaltsgleichen i-ten Komponenten repräsentiert. Die beiden genannten Schritte des erfindungsgemäßen Verfahrens werden bevorzugt in einem gemeinsamen Ablauf durchgeführt, bei welchem die Inhalte der i-ten Komponenten und der j-ten Komponenten der Tupel für jedes Tupel nacheinander abgelegt werden. In einem weiteren Schritt des erfindungsgemäßen Verfahrens wird die aus den mehreren assoziierten Bäumen gebildete Datenstruktur ausgegeben, bevorzugt als Datei, beispielsweise als eine gemäß OCAML kodierte Datei.The method according to the invention comprises a plurality of steps which are to be carried out in a technical system, in particular in a data processing system. In one of the steps, the contents of the i-th components of the tuples are stored in nodes of a first tree to be set up, preferably by an electronic, optical or magnetic writing process. The deposition of these contents is preferably carried out by successively storing the contents of the i-th components of all tuples in a memory, in particular in a volatile data memory. The first tree to be built is a tree in the sense of graph theory, which is preferably designed to be balanced. The contents of the ith components of the tuples are stored in such a way that identical contents of the i-th components are represented by exactly one of the nodes of the first tree, so that those tuples with content-identical i-th components only each have exactly one node of the first Create tree. In a further step of the method according to the invention, the contents of the jth components of the tuples are stored in nodes of further trees to be established. The storage of the contents of the jth components of the tuples preferably takes place in such a way that successively the contents of the jth components of all tuples are stored in a memory, in particular in a volatile data memory. Also in the other trees are trees in the sense of graph theory, which are preferably formed balanced. The j-th components of those tuples with an identical i-th component are stored in each one of the other trees. Consequently, each of the further trees contains the contents of the j-th components of all those tuples whose i-th components have the same content. The further trees are each associated with that node of the first tree which represents the content of the respective ith component, i. H. includes him as well. Consequently, the j-th components of all those tuples having an inequality i-th component are stored within one of the further trees, this tree being associated with that node of the first tree representing the content of the i-th constituent components. The two mentioned steps of the method according to the invention are preferably carried out in a common procedure in which the contents of the i-th components and the j-th components of the tuples are stored one after the other for each tuple. In a further step of the method according to the invention, the data structure formed from the plurality of associated trees is output, preferably as a file, for example as a file coded according to OCAML.

Das erfindungsgemäße Verfahren erlaubt im Vergleich zum Stand der Technik eine technisch wesentlich effizientere Umstrukturierung von hierarchisch strukturierten Daten. Beispielsweise erfordert das Einfügen eines weiteren Tupels nur eine der Höhe des aufzubauenden ersten Baumes entsprechende Anzahl von Vergleichen, wohingegen sich diese Anzahl bei auf Listen basierenden Lösungen gemäß dem Stand der Technik nach der Anzahl der Listeneinträge richtet. So erfordert beispielsweise das Einfügen eines weiteren Elementes in eine Liste mit 1.000 Einträgen durchschnittlich 500 Vergleiche, wohingegen das erfindungsgemäße Verfahren lediglich log₂ 1000 -> 10 Vergleiche erfordert. Für übliche Datenbankgrößen führt das erfindungsgemäße Verfahren gegenüber dem Stand der Technik zu einer bis zu 1000-fach erhöhten Effizienz, wodurch es erstmalig eine technisch realisierbare Durchführung derartiger Umstrukturierungen von hierarchisch strukturierten Daten ermöglicht.In comparison to the prior art, the method according to the invention allows a technically much more efficient restructuring of hierarchically structured data. For example, the insertion of another tuple only requires a number of comparisons corresponding to the height of the first tree to be built, whereas in list-based solutions according to the prior art this number depends on the number of list entries. For example, inserting another element into a list of 1,000 entries requires an average of 500 comparisons, whereas the method of the invention requires only log ₂ 1000 -> 10 comparisons. For conventional database sizes, the method according to the invention leads to an efficiency of up to 1000 times that of the prior art, which makes it possible for the first time to carry out such restructuring of hierarchically structured data in a technically feasible manner.

Die Schritte des Ablegens der Inhalte der i-ten Komponenten der Tupel und des Ablegens der Inhalte der j-ten Komponenten der Tupel umfassen in ihrer Einheit bevorzugt mehrere Teilschritte. Bei einem ersten Teilschritt erfolgen ein Einlesen eines ersten Tupels der Daten und ein Ablegen des Inhaltes der i-ten Komponente des ersten Tupels in einem ersten Knoten des aufzubauenden ersten Baumes. Dabei wird der Inhalt der j-ten Komponente des ersten Tupels in einem ersten Knoten eines aufzubauenden mit dem ersten Knoten des ersten Baumes assoziierten zweiten Baumes abgelegt. In einem zweiten Teilschritt wird ein zweites Tupel der Daten eingelesen. Wenn sich der Inhalt der i-ten Komponente des zweiten Tupels vom Inhalt des ersten Knotens des ersten Baumes unterscheidet, wird der Inhalt der i-ten Komponente des zweiten Tupels in einem zweiten Knoten des ersten Baumes abgelegt, während der Inhalt der j-ten Komponente des zweiten Tupels in einem ersten Knoten eines aufzubauenden mit dem zweiten Knoten des ersten Baumes assoziierten dritten Baumes abgelegt wird. Falls der Inhalt der i-ten Komponente des zweiten Tupels dem Inhalt des ersten Knotens des ersten Baumes gleicht, wird der Inhalt der j-ten Komponente des zweiten Tupels in einen zweiten Knoten des zweiten Baumes abgelegt, wobei der Inhalt der i-ten Komponente des zweiten Tupels nicht in einem weiteren Knoten des ersten Baumes abgelegt wird, da er bereits durch den ersten Knoten des ersten Baumes repräsentiert wird. In einem weiteren Teilschritt wird ein weiteres Tupel der Daten eingelesen. Der Inhalt der i-ten Komponente des weiteren Tupels wird in einem weiteren Knoten des ersten Baumes abgelegt, wenn sich der Inhalt der i-ten Komponente des weiteren Tupels von den Inhalten der bereits vorhandenen Knoten des ersten Baumes unterscheidet. In diesem Fall wird der Inhalt der j-ten Komponente des weiteren Tupels in einem ersten Knoten eines aufzubauenden mit dem weiteren Knoten des ersten Baumes assoziierten weiteren Baumes abgelegt. Falls der Inhalt der i-ten Komponente des weiteren Tupels einem der Inhalte der bereits vorhandenen Knoten des ersten Baumes gleicht, wird der Inhalt der j-ten Komponente des weiteren Tupels als ein weiterer Knoten desjenigen Baumes abgelegt, welcher mit demjenigen Knoten des ersten Baumes assoziiert ist, dessen Inhalt dem Inhalt der i-ten Komponente des weiteren Tupels gleicht. Der Teilschritt des Einlesens des weiteren Tupels wird so oft wiederholt, bis alle Daten, d. h. alle Tupel eingelesen sind und deren Inhalte abgelegt sind.The steps of dropping the contents of the i-th components of the tuples and dropping the contents of the j-th components of the tuples preferably comprise a plurality of substeps in their unit. In a first sub-step, a reading in of a first tuple of the data and a dropping off of the content of the i-th component of the first tuple take place in a first node of the first tree to be set up. In this case, the content of the jth component of the first tuple is stored in a first node of a second tree to be set up associated with the first node of the first tree. In a second sub-step, a second tuple of the data is read. If the content of the ith component of the second tuple is the content of the first Node of the first tree, the content of the ith component of the second tuple is stored in a second node of the first tree, while the content of the jth component of the second tuple is to be established in a first node of the second node of the first tree associated third tree is stored. If the content of the ith component of the second tuple equals the content of the first node of the first tree, the content of the jth component of the second tuple is stored in a second node of the second tree, the content of the ith component of the second tuple second tuple is not placed in another node of the first tree, since it is already represented by the first node of the first tree. In a further sub-step another tuple of the data is read. The content of the i-th component of the further tuple is stored in a further node of the first tree if the content of the i-th component of the further tuple differs from the contents of the already existing nodes of the first tree. In this case, the content of the jth component of the further tuple is stored in a first node of a further tree to be set up associated with the further node of the first tree. If the content of the ith component of the further tuple equals one of the contents of the already existing nodes of the first tree, the content of the jth component of the further tuple is deposited as another node of the tree which associates with that node of the first tree whose content is equal to the content of the ith component of the further tuple. The sub-step of reading in the further tuple is repeated until all data, ie all tuples are read in and their contents are stored.

Das Ablegen der Inhalte der i-ten Komponenten der Tupel in den Knoten des ersten Baumes und das Ablegen der Inhalte der j-ten Komponenten der Tupel in den Knoten der weiteren Bäume erfolgt bevorzugt derart, dass der erste Baum und die weiteren Bäume als balancierte Bäume, besonders bevorzugt als AVL-Bäume aufgebaut werden. Der Aufbau von AVL-Bäumen führt zu einer besonders hohen Effizienz des erfindungsgemäßen Verfahrens, da die Anzahl der notwendigen Vergleiche zum Aufbauen und Ändern der Bäume minimiert ist. Bei weiteren bevorzugten Ausführungsformen der Erfindung erfolgt das Ablegen der Inhalte der i-ten Komponenten der Tupel in den Knoten des ersten Baumes und das Ablegen der Inhalte der j-ten Komponenten der Tupel in den Knoten der weiteren Bäume derart, dass der erste Baum und die weiteren Bäume als B-Bäume oder als Rot-Schwarz-Bäume aufgebaut werden. Bei dem Aufbauen der Bäume als balancierte Bäume muss gewährleistet sein, dass das weitere Ablegen von Inhalten der i-ten Komponenten der Tupel in den Knoten des ersten Baumes bzw. das weitere Ablegen der Inhalte der j-ten Komponenten der Tupel in den Knoten der weiteren Bäume die Balanciertheitseigenschaft des jeweiligen Baumes nicht zerstört. Der Fachmann kennt geeignete Vorgehensweisen beim Ablegen der Daten in Bäumen, um die Balanciertheitseigenschaft aufrecht zu erhalten, z. B. Rotationen in AVL-Bäumen und Halbierung von Knoten in B-Bäumen.The deposition of the contents of the i-th components of the tuples in the nodes of the first tree and the deposition of the contents of the j-th components of the tuples in the nodes of the other trees is preferably such that the first tree and the other trees as balanced trees , are particularly preferably constructed as AVL trees. The construction of AVL trees leads to a particularly high efficiency of the method according to the invention, since the number of necessary comparisons for building and changing the trees is minimized. In further preferred embodiments of the invention, the contents of the i-th components of the tuples are dropped into the nodes of the first tree and the contents of the j-th components of the tuples are stored in the nodes of the further trees such that the first tree and the other trees are constructed as B-trees or as red-black trees. When constructing the trees as balanced trees, it must be ensured that the further depositing of contents of the i-th components of the tuples into the nodes of the first tree or the further deposition of the contents of the j-th components of the tuples in the nodes of the other Trees do not destroy the balance property of each tree. The person skilled in the art knows of suitable procedures for storing the data in trees in order to maintain the balance property, e.g. Rotations in AVL trees and halving of nodes in B trees.

Für das Aufbauen des ersten Baumes und der weiteren Bäume als balancierte Bäume erfolgt das Ablegen des Inhaltes der i-ten Komponente des eingelesenen Tupels in einem weiteren Knoten des ersten Baumes jeweils bevorzugt dadurch, dass dieser weitere Knoten an einem unteren Ende des ersten Baumes angehängt wird, falls der erste Baum dadurch balanciert bleibt, oder anderenfalls, dass dieser weitere Knoten den bereits vorhandenen Knoten des ersten Baumes übergeordnet wird. In gleicher Weise erfolgt das Ablegen des Inhaltes der j-ten Komponente des eingelesenen Tupels in einem weiteren Knoten des jeweiligen Baumes bevorzugt jeweils dadurch, dass dieser weitere Knoten an einem unteren Ende des jeweiligen Baumes angehängt wird, falls der jeweilige Baum dadurch balanciert bleibt, oder anderenfalls, dass dieser weitere Knoten den bereits vorhandenen Knoten des jeweiligen Baumes übergeordnet wird.For the construction of the first tree and the other trees as balanced trees, the content of the i-th component of the read-in tuple is stored in a further node of the first tree, preferably by attaching this further node to a lower end of the first tree if the first tree remains balanced by this, or else that this additional node becomes the parent of the already existing node of the first tree. In the same way, the content of the jth component of the read-in tuple is stored in a further node of the respective tree, preferably by appending this further node to a lower end of the respective tree, if the respective tree remains balanced by, or otherwise, this additional node becomes the parent of the existing tree of the respective tree.

Eine besondere Ausführungsform des erfindungsgemäßen Verfahrens ist zur Umstrukturierung von solchen Daten vorgesehen, bei welchen weiterhin k-te Komponenten der Tupel jeweils durch eine Liste an Daten gebildet sind. Die Umstrukturierungsanfrage sieht weiterhin vor, dass Inhalte der Listen der k-ten Komponenten den Inhalten der j-ten. Komponenten unterzuordnen sind. Bei dieser besonderen Ausführungsform des erfindungsgemäßen Verfahrens wird vor den übrigen Schritten des Verfahrens ein Schritt durchgeführt, bei welchem modifizierte Tupel gebildet werden. Die modifizierten Tupel werden für jedes Datum der Listen der k-ten Komponenten gebildet, wobei die modifizierten Tupel jeweils das jeweilige Datum aus der jeweiligen Liste sowie die i-te Komponente und die j-te Komponente des die jeweilige Liste umfassenden Tupels umfassen. Folglich entstehen so viele modifizierte Tupel, wie alle Listen der k-ten Komponenten gemeinsam Einträge besitzen. Die übrigen Schritte des erfindungsgemäßen Verfahrens werden unter der Maßgabe durchgeführt, dass die Schritte auf die modifizierten Tupel als Tupel angewendet werden.A particular embodiment of the method according to the invention is provided for the restructuring of such data, in which further k-th components of the tuples are each formed by a list of data. The restructuring request further provides that contents of the lists of the kth components are the contents of the jth. Subordinate components. In this particular embodiment of the method according to the invention, a step is carried out before the remaining steps of the method in which modified tuples are formed. The modified tuples are formed for each date of the k-th component lists, the modified tuples each including the respective date from the respective list, and the i-th component and the j-th component of the tuple comprising the respective list. Consequently, so many modified tuples are created as all lists of the kth components share entries. The remaining steps of the method according to the invention are carried out under the condition that the steps are applied to the modified tuples as tuples.

Eine weitere besondere Ausführungsform des erfindungsgemäßen Verfahrens ist zur Umstrukturierung von Daten vorgesehen, die neben den genannten ersteren Tupeln mit den i-ten Komponenten und den j-ten Komponenten eine weitere Menge an Daten umfassen, die in weiteren Tupeln mit mehreren Komponenten angeordnet sind, wobei die weiteren Tupel eine andere Struktur als die ersteren Tupel besitzen können. Sämtliche erste Tupel und sämtliche weitere Tupel können jeweils als Kollektion bezeichnet werden. Die weiteren Tupel weisen l-te Komponenten und m-te Komponenten auf, wobei die m-ten Komponenten bevorzugt den j-ten Komponenten der ersteren Tupeln im Typ gleichen und die l-ten Komponenten der weitern Tupeln bevorzugt den i-ten Komponenten der ersteren Tupeln im Typ gleichen. Die Umstrukturierungsanfrage sieht weiterhin vor, dass Inhalte der m-ten Komponenten der weiteren Tupeln Inhalten der l-ten Komponenten der weiteren Tupeln unterzuordnen sind. Diese besondere Ausführungsform des erfindungsgemäßen Verfahrens umfasst weiterhin einen Schritt, bei welchem die Inhalte der l-ten Komponenten der weiteren Tupel in Knoten des aufzubauenden ersten Baumes abgelegt werden, wobei gleiche Inhalte durch jeweils genau einen der Knoten des ersten Baumes repräsentiert werden. Folglich kann einer der Knoten des ersten Baumes sowohl die Inhalte einer i-ten Komponente der ersteren Tupel als auch die Inhalte der l-ten Komponenten der weiteren Tupel repräsentieren. In einem weiteren Schritt werden die Inhalte der m-ten Komponenten der weiteren Tupeln in Knoten der weiteren aufzubauenden Bäume abgelegt, wobei die m-ten Komponenten derjenigen weiteren Tupel mit einer inhaltsgleichen l-ten Komponente in jeweils einem der weiteren Bäume abgelegt werden, der mit demjenigen Knoten des ersten Baumes assoziiert ist, welcher den Inhalt der jeweiligen l-ten Komponente repräsentiert. Folglich können in den weitern Bäumen jeweils sowohl j-te Komponenten der ersteren Tupel als auch m-te Komponenten der weiteren Tupel abgelegt sein. Die Daten aller Tupel werden somit innerhalb derselben aus den mehreren assoziierten Bäumen gebildeten Struktur abgelegt. Selbstverständlich kann diese Ausführungsform der Erfindung auch zur Strukturierung von Daten entsprechend erweitert werden, die drei oder mehr Kollektionen umfassen.A further particular embodiment of the method according to the invention is provided for the restructuring of data which, in addition to the aforementioned first tuples with the i-th components and the j-th components, comprise a further amount of data which are arranged in further tuples having a plurality of components the further tuples may have a different structure than the former tuples. All first tuples and all other tuples can each be called a collection. The further tuples have l-th components and m-th components, wherein the m-th components preferably equal the j-th components of the former tuples in the type and the l-th components of the further tuples preferably the i-th components of the former Tuples in the same type. The restructuring request further provides that contents of the m-th components of the further tuples are subordinate to contents of the l-th components of the further tuples. This particular embodiment of the method according to the invention further comprises a step in which the contents of the l-th components of the further tuples are stored in nodes of the first tree to be set up, wherein identical contents are represented by exactly one of the nodes of the first tree. Thus, one of the nodes of the first tree may represent both the contents of an i-th component of the former tuple and the contents of the l-th components of the further tuples. In a further step, the contents of the m-th components of the further tuples are stored in nodes of the further trees to be built, wherein the m-th components of those other tuples are deposited with a content-same l-th component in each one of the other trees with is associated with that node of the first tree which represents the content of the respective l-th component. Consequently, in the further trees both j-th components of the former tuples and m-th components of the further tuples can be deposited. The data of all tuples are thus stored within the same structure formed by the plurality of associated trees. Of course, this embodiment of the invention may also be expanded accordingly to structure data comprising three or more collections.

Die i-ten Komponenten der Tupel sind bevorzugt durch Felder der Tupel gebildet. Alternativ oder gleichzeitig sind auch die j-ten Komponenten der Tupel bevorzugt durch Felder der Tupel gebildet.The ith components of the tuples are preferably formed by fields of the tuples. Alternatively or simultaneously, the jth components of the tuples are preferably formed by fields of the tuples.

Bei einer weiteren bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens sind die i-ten Komponenten der Tupel durch u-te Felder innerhalb von r-ten Subtupeln der Tupel gebildet. Alternativ oder gleichzeitig sind die j-ten Komponenten der Tupel bevorzugt durch v-te Felder innerhalb von s-ten Subtupeln der Tupel gebildet. Das erfindungsgemäße Verfahren ist somit auf beliebig hierarchisch strukturierte Daten anwendbar. Auch können die i-ten Komponenten und die j-ten Komponenten durch Felder innerhalb von Subsubtupeln in Subtupeln der Tupel gebildet sein. Die i-ten Komponenten und die j-ten Komponenten der Tupel können in einer beliebigen Hierarchietiefe der Tupel angeordnet sein.In a further preferred embodiment of the method according to the invention, the i-th components of the tuples are formed by u-th fields within r-th subtuples of the tuples. Alternatively or simultaneously, the jth components of the tuples are preferably formed by vth fields within s-th sub-tuples of the tuples. The inventive method is thus applicable to any hierarchical structured data. Also, the i-th components and the j-th components may be formed by fields within subsubt cups in sub-tuples of the tuples. The i-th components and the j-th components of the tuples can be arranged at any hierarchical depth of the tuples.

Bei einer bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens beinhalten die i-ten Komponenten und die j-ten Komponenten der Tupel jeweils Daten über technische Elemente, wie elektronische Bauelemente, Maschinenelemente oder bautechnische Elementen. Bevorzugt beschreiben die Daten in den i-ten Komponenten und in den j-ten Komponenten jeweils ein Merkmal der technischen Elemente, wobei die Daten in den i-ten Komponenten jeweils ein Merkmal erster Art und die Daten in den j-ten Komponenten jeweils ein Merkmal zweiter Art beschreiben. Bei den Merkmalen der ersten und zweiten Art kann es sich beispielsweise um eine Länge, eine Masse, eine Kraft, einen elektrischen Widerstand, eine elektrische Kapazität oder eine mechanische Belastbarkeit handeln.In a preferred embodiment of the method according to the invention, the i-th components and the j-th components of the tuples respectively contain data about technical elements, such as electronic components, machine elements or structural elements. Preferably, the data in the i-th components and in the j-th components each describe a feature of the technical elements, the data in the ith components each being a feature of the first kind and the data in the j-th components each being a feature describe second type. The features of the first and second types may be, for example, a length, a mass, a force, an electrical resistance, an electrical capacitance or a mechanical load capacity.

Das erfindungsgemäße Verfahren ermöglicht die technische Realisierung von Umstrukturierungen großer Datenmengen, sodass die umzustrukturierenden Daten bevorzugt mehr als 100, oder besonders bevorzugt mehr als 10.000, oder weiterhin besonders bevorzugt mehr als 1.000.000 der Tupel umfassen. Dementsprechend erreicht zumindest einer der aufzubauenden Bäume bevorzugt eine Höhe von 10, besonders bevorzugt eine Höhe von 20.The inventive method allows the technical realization of restructuring large amounts of data, so that the data to be restructured preferably more than 100, or more preferably more than 10,000, or more preferably more than 1,000,000 include the tuple. Accordingly, at least one of the trees to be built preferably reaches a height of 10, more preferably a height of 20.

Weitere Einzelheiten, Vorteile und Weiterbildungen der Erfindung ergeben sich aus der nachfolgenden Beschreibung bevorzugter Ausführungsformen, unter Bezugnahme auf die Zeichnung. Es zeigen:Further details, advantages and developments of the invention will become apparent from the following description of preferred embodiments, with reference to the drawing. Show it:

1: ein erstes Beispiel für eine Datenstruktur im Ergebnis der Durchführung einer bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens; und 1 a first example of a data structure as a result of carrying out a preferred embodiment of the method according to the invention; and

2: ein zweites Beispiel für eine Datenstruktur im Ergebnis der Durchführung der bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens. 2 A second example of a data structure as a result of carrying out the preferred embodiment of the method according to the invention.

1 zeigt eine Datenstruktur, wie sie beispielhaft im Ergebnis der Durchführung einer bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens entsteht. Diese Datenstruktur kann als ein Baum aufgefasst werden, welcher mehrere assoziierte Bäume umfasst. Einfache Pfeile kennzeichnen Kanten eines ersten Baumes. Pfeillose Linien kennzeichnen Kanten weitere Bäume. Doppelpfeile kennzeichnen Assoziationen zwischen einem Knoten des ersten Baumes und einem der weiteren Bäume. 1 shows a data structure, as exemplified by the result of carrying out a preferred embodiment of the method according to the invention. This data structure can be thought of as a tree comprising several associated trees. Simple arrows indicate edges of a first tree. Arrowless lines mark edges of other trees. Double arrows indicate associations between a node of the first tree and one of the other trees.

Ausgangspunkt für das in 1 gezeigte Beispiel bildet eine XML-Datei DAT.XML, in welcher Daten in mehreren Tupeln kodiert sind:

Starting point for the in 1 The example shown forms an XML file DAT.XML in which data is encoded in several tuples:

Die in diesem XML-Dokument kodierten Daten sind zur Veranschaulichung nachfolgend tabellarisch aufgelistet:

The data encoded in this XML document are tabulated below for illustrative purposes:

Diese Daten können wie folgt in OCAML repräsentiert werden (aus Platzgründen sind nur die Daten der ersten beiden Tupel dargestellt):

These data can be represented in OCAML as follows (for reasons of space only the data of the first two tuples are shown):

Der in der Abfragesprache OttoQL vorhandene Befehl gib basiert auf der Umstrukturierungsoperation stroke, welche eine Umstrukturierung von Daten ermöglicht. Die Umstrukturierung wird an folgendem Beispiel erläutert:

aus doc(”dat.xml”)
gib M(X,M(Y))# =Set(X,SET(Y)) The OttoQL query language command is based on the stub restructuring operation, which allows data to be restructured. The restructuring is explained by the following example:

from doc ("dat.xml")
give M (X, M (Y)) # = set (X, SET (Y))

Mithilfe dieses Umstrukturierungsbefehls sollen alle Y-Felder der Tupel den X-Feldern der Tupel untergeordnet werden. Im Ergebnis entsteht folgende XML-Struktur:

This restructuring command is to subordinate all Y-fields of the tuples to the X-fields of the tuples. The result is the following XML structure:

Zur Veranschaulichung sind die in dieser XML-Struktur kodierten Daten nachfolgend tabellarisch aufgelistet:

By way of illustration, the data encoded in this XML structure is tabulated below:

Die gezeigte Umstrukturierung erfolgte durch Anwendung einer bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens. Dabei wird eine Datenstruktur erzeugt, wie sie in 1 gezeigt ist. Die X-Felder der Tupel enthalten die Inhalte „a”, „c”, „g”, die in den Knoten des ersten Baumes abgelegt sind. Obwohl mehrere Tupel im X-Feld ein „a” enthalten, umfasst der erste Baum nur einen Knoten mit dem Inhalt „a”. Mit jedem Knoten des ersten Baumes ist ein weiterer Baum assoziiert, wobei die weiteren Bäume zumindest einen Knoten enthalten. In den weiteren Knoten sind die Inhalte der Y-Felder der Tupel abgelegt. Dabei sind jeweils alle Inhalte in einem der weiteren Bäume abgelegt, welche in den Y-Feldern derjenigen Tupel vorhanden sind, welche im X-Feld den Inhalt des mit dem weiteren Baum assoziierten Knotens des ersten Baumes gemeinsam haben.The restructuring shown was carried out by applying a preferred embodiment of the method according to the invention. This creates a data structure as described in 1 is shown. The X-fields of the tuples contain the contents "a", "c", "g", which are stored in the nodes of the first tree. Although multiple tuples in the X field contain an "a", the first tree contains only one node with the content "a". Each tree of the first tree is associated with another tree, with the further trees containing at least one node. The contents of the Y-fields of the tuples are stored in the other nodes. In this case, all contents are stored in one of the other trees, which are present in the Y-fields of those tuples which have in common the content of the node of the first tree associated with the further tree in the X-field.

Die in 1 gezeigte Datenstruktur kann durch den folgenden Ausdruck in OCAML beschrieben werden:

In the 1 The data structure shown can be described by the following expression in OCAML:

2 zeigt beispielhaft eine weitere Datenstruktur, wie sie im Ergebnis der Durchführung der bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens entstanden ist. Diese Datenstruktur ist in gleicher Weise wie die in 1 gezeigte Datenstruktur dargestellt. 2 shows by way of example a further data structure, as it has arisen as a result of carrying out the preferred embodiment of the method according to the invention. This data structure is the same as that in 1 shown data structure shown.

Ausgangspunkt für das in 2 gezeigte Beispiel bildet eine XML-Datei „studenten.tab” in Form einer Tabelle mit Studentendaten:

Starting point for the in 2 The example shown forms an XML file "students.tab" in the form of a table with student data:

In diesem Beispiel sollen die Namen der Studenten nach Geschlechtern strukturiert und sortiert werden. Der erforderliche Umstrukturierungsbefehl lautet in OttoQL:

aus doc(”studenten.tab”)
gib M(SEX,B(NAME))In this example, the names of the students are to be structured and sorted by gender. The required restructuring command in OttoQL is:

from doc ("students.tab")
give M (SEX, B (NAME))

Die Quellstruktur besteht acht Tupeln mit Daten über Studenten. Jedes Tupel hat den Typ
(STID,NAME,VORNAME,FAK,MATRIK,SEX,L(FACH,NOTE),L(HOBBY)) und besitzt ein STID-Segment vom Typ
(STID,NAME,VORNAME,FAK,MATRIK,SEX) sowie untergeordnete FACH- und HOBBY-Subtupel. Da das FACH-Subtupel keine Kollektionen enthält, gleicht es dem FACH-Segment (Typ(FACH,NOTE)). Dies gilt in gleicher Weise für das HOBBY-Subtupel. Gemäß der bevorzugten Ausführungsform des erfindungsgemäßen Verfahrens wird zunächst der erste Studentensatz, nämlich das erste Tupel in eine leere Zielstruktur vom Schema M(SEX,B(NAME)) eingefügt, d. h. es wird ein SEX-Segment „M” mit untergeordnetem NAME-Segment „Meier” gebildet. Somit liegen bereits zwei Knoten einer aufzubauenden aus mehreren assoziierten Bäumen gebildeten Struktur vor, wobei diese beiden Knoten auch noch in der in 2 gezeigten vollständig ausgebildeten Struktur vorhanden sind. Da die STID-Segmente in alle Segmente der Zielstruktur einfügt werden können, müssen die FACH- und HOBBY-Segmente nicht näher berücksichtigt werden. In einem nächsten Schritt werden die Daten des zweiten Tupels eingefügt. Hierfür wird der zugehörige SEX-Wert mit dem bereits in der Zielstruktur vorhandenen SEX-Wert verglichen und festgestellt, dass ein neuer Knoten mit dem Wert „F” eingefügt werden muss. Anschließend wird in die leere NAME-Multimenge der Wert „Mueller” eingefügt, d. h. es wird ein Knoten mit diesem Inhalt angehängt, welcher den ersten Knoten eines mit dem Knoten „F” assoziierten Baumes bildet. Das Einfügen der Daten des dritten Tupels erfolgt in gleicher Weise. Da der zugehörige SEX-Wert „M” bereits vorhanden ist, muss lediglich der NAME-Wert „Schulz” zu dieser Menge hinzugefügt werden, indem ein weiterer Knoten mit diesem Wert an den mit dem Knoten „M” assoziierten Baum angehängt wird. Bei diesem Hinzufügen von Werten in eine Menge (M) oder eine Multimenge (B = bag) unterscheidet sich das erfindungsgemäße Verfahren von vorbekannten Verfahren gemäß dem Stand der Technik, wie sie in OttoQL angewendet wurden. Bei den vorbekannten Verfahren waren sowohl M- als auch B- und L-Kollektionen durch Listen im Sinne funktionaler Programmiersprachen implementiert. Dadurch war das Sortieren und Eliminieren von Duplikaten in M- und B-Kollektionen sehr aufwändig, insbesondere bei großen Kollektionen. Durch die erfindungsgemäße Repräsentation der M- und B- Kollektionen der Zielstruktur als binäre balancierte Bäume sind das Einfügen von Daten und ggf. auch das Eliminieren von Duplikaten wesentlich effizienter, sodass diese Umstrukturierungen auch bei größeren Datenmengen durch die Erfindung erstmalig technisch realisierbar sind.The source structure consists of eight tuples with data about students. Every tuple has the type
(STID, NAME, FIRSTNAME, FAK, MATRIC, SEX, L (TRAY, NOTE), L (HOBBY)) and has a type STID segment
(STID, NAME, FIRSTNAME, FAK, MATRIK, SEX) as well as child FACH and HOBBY subtuples. Since the FACH sub-tuple contains no collections, it resembles the FACH segment (type (FACH, NOTE)). This applies equally to the HOBBY subtuple. According to the preferred embodiment of the method according to the invention, first the first student sentence, namely the first tuple, is inserted into an empty target structure of the schema M (SEX, B (NAME)), ie an SEX segment "M" with a subordinate NAME segment " Meier "formed. Thus, there are already two nodes of a structure formed of several associated trees before, these two nodes also in the in 2 shown fully formed structure are present. Since the STID segments can be inserted in all segments of the target structure, the FACH and HOBBY segments do not need to be considered more closely. In a next step, the data of the second tuple is inserted. For this, the associated SEX value is compared with the SEX value already present in the target structure and it is determined that a new node with the value "F" must be inserted. Subsequently, the value "Mueller" is inserted into the empty NAME multi-set, ie a node with this content is added, which forms the first node of a tree associated with the node "F". The insertion of the data of the third tuple takes place in the same way. Since the associated SEX value "M" already exists, only the "Schulz" NAME value needs to be added to this quantity by adding another node with this value to the tree associated with the "M" node. In this addition of values into an amount (M) or a multi-set (B = bag), the method according to the invention differs from prior art methods of the prior art, as used in OttoQL. In the prior art methods, both M and B and L collections were implemented by lists in terms of functional programming languages. This made sorting and eliminating duplicates in M and B collections very time-consuming, especially for large collections. Due to the inventive representation of the M and B collections of the target structure as binary balanced trees The insertion of data and possibly the elimination of duplicates are much more efficient, so that these restructurings are technically feasible for the first time even with larger amounts of data by the invention.

Im Ergebnis der beschriebenen Umstrukturierung liegt die in 2 gezeigte Struktur vor. Das Ergebnis kann wie folgt tabellarisch ausgegeben werden:

As a result of the described restructuring lies in 2 shown structure. The result can be output in tabular form as follows:

Mit dem erfindungsgemäßen Verfahren können auch weitere Daten im Rahmen einer Umstrukturierung effizient eingefügt werden. Soll beispielsweise der Wert eines weiteren NAME-Feldes einfügt werden, so sind hierfür höchstens drei Vergleiche mit den bereits eingetragenen Werten erforderlich. Bei den listenbasierten Verfahren gemäß dem Stand der Technik können sechs Vergleiche nötig sein. Der Unterschied wird bei realen Datenbankgrößen deutlicher. Wenn beispielsweise ein weiterer Wert in eine Kollektion von 1000 Werten einfügt werden soll, so hat der Baum der erfindungsgemäß erzeugten Struktur lediglich eine Höhe von 10, wodurch höchstens zehn Vergleiche erforderlich sind, wohingegen bei einem listenbasierten Verfahren gemäß dem Stand der Technik bis zu 1.000 Vergleiche erforderlich sind.With the method according to the invention, further data can also be efficiently inserted as part of a restructuring. If, for example, the value of another NAME field is to be inserted, at most three comparisons with the values already entered are required for this. In the prior art list based methods, six comparisons may be necessary. The difference becomes clearer with real database sizes. For example, if another value is to be included in a collection of 1000 values, then the tree of the structure produced according to the invention will only have a height of 10, which will require a maximum of ten comparisons, whereas in a prior art list-based method, there will be up to 1,000 comparisons required are.

Das erfindungsgemäße Verfahren erlaubt im Vergleich zum Stand der Technik eine technisch wesentlich effizientere Umstrukturierung von hierarchisch strukturierten Daten, wodurch eine solche Umstrukturierung erstmalig auch bei größeren Datenmengen praktisch realisierbar durchgeführt werden kann. Im Folgenden werden die Ergebnisse von Vergleichsmessungen gezeigt, bei denen Umstrukturierungen zum einen mit einem auf Listen basierenden Verfahren gemäß dem Stand der Technik und zum anderen mit dem erfindungsgemäßen Verfahren durchgeführt wurden. Für die Vergleichsmessungen wurde ein Personalcomputer mit einem Prozessor vom Typ Intel, Core Duo E 6750, 2,66 GHz und mit 4 × 1024 MB RAM genutzt.Compared to the prior art, the method according to the invention allows a technically much more efficient restructuring of hierarchically structured data, whereby such a restructuring can be carried out practically realizable for the first time even with larger amounts of data. In the following, the results of comparison measurements are shown in which restructuring was carried out on the one hand with a list-based method according to the prior art and on the other hand with the method according to the invention. For the comparison measurements, a personal computer with an Intel, Core Duo E 6750, 2.66 GHz processor and 4 × 1024 MB RAM was used.

In einer ersten Vergleichsmessung wurden Daten umstrukturiert, welche 1.000 Tupel umfassen. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(X,M(Y,M(Z))) dauerte mit einem Verfahren gemäß dem Stand der Technik 0,047 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 0,031 s dauerte. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(X,M(Y,Z)) dauerte mit einem Verfahren gemäß dem Stand der Technik 0,094 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 0,031 s dauerte. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(X,Y,Z) dauerte mit einem Verfahren gemäß dem Stand der Technik 0,344 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 0,025 s dauerte.In a first comparative measurement, data was restructured comprising 1,000 tuples. Restructuring the data according to the restructuring request gave M (X, M (Y, M (Z))) took 0.047 s with a prior art method, whereas it took only 0.031 s with the inventive method. Restructuring the data according to the restructuring request gave M (X, M (Y, Z)) took 0.094 s with a prior art method, whereas it only took 0.031 s with the method according to the invention. The restructuring of the data according to the restructuring request gave M (X, Y, Z) with a method according to the prior art 0.344 s, whereas it took only 0.025 s with the method according to the invention.

In einer zweiten Vergleichsmessung wurden Daten umstrukturiert, welche 1.700 Tupel umfassen. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(STID,M(COURSE,M(MARK))) dauerte mit einem Verfahren gemäß dem Stand der Technik 0,25 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 0,047 s dauerte. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(STID,M(COURSE,MARK)) dauerte mit einem Verfahren gemäß dem Stand der Technik 0,27 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 0,046 s dauerte. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(STID,COURSE,MARK) dauerte mit einem Verfahren gemäß dem Stand der Technik 5,6 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 0,049 s dauerte.In a second comparative measurement, data was restructured, comprising 1,700 tuples. Restructuring the data according to the Restructuring Request M (STID, M (COURSE, M (MARK)) took 0.25 s with a prior art method, whereas it only took 0.047 s with the inventive method. The restructuring of the data according to the restructuring request gave M (STID, M (COURSE, MARK)) took 0.27 s with a prior art method, whereas with the method according to the invention it lasted only 0.046 s. Restructuring the data according to the Restructuring Inquiry give M (STID, COURSE, MARK) took 5.6 seconds with a prior art method, whereas with the inventive method it lasted only 0.049 seconds.

In einer dritten Vergleichsmessung wurden Daten umstrukturiert, welche 300 Tupel mit jeweils 300 Subtupel umfassen. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(X,M(Y)) dauerte mit einem Verfahren gemäß dem Stand der Technik 13,7 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 0,85 s dauerte. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(Y,M(X)) dauerte mit einem Verfahren gemäß dem Stand der Technik 31,2 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 1,76 s dauerte. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(Y,X) dauerte mit einem Verfahren gemäß dem Stand der Technik 2646,6 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 2,3 s dauerte. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib M(X,Y) dauerte mit einem Verfahren gemäß dem Stand der Technik 2243,6 s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 1,99 s dauerte.In a third comparative measurement, data was restructured comprising 300 tuples with 300 sub-tuples each. The restructuring of the data according to the restructuring request gave M (X, M (Y)) lasted 13.7 s with a prior art method, whereas it took only 0.85 s with the method according to the invention. The restructuring of the data according to the restructuring request gave M (Y, M (X)) lasted 31.2 s with a prior art method, whereas it was with the process according to the invention took only 1.76 s. The restructuring of the data according to the restructuring request gave M (Y, X) lasted 2646.6 s with a prior art method, whereas it took only 2.3 s with the method according to the invention. Restructuring the data according to the restructuring request, M (X, Y), took 2243.6 seconds with a prior art method, whereas it took only 1.99 seconds with the method of the invention.

In einer vierten Vergleichsmessung wurden Daten umstrukturiert, die aus drei Dateien bestanden, wobei die erste Datei 100 flache Tupel, die zweite Datei 1.700 flache Tupel und die dritte Datei 600 flache Tupel umfasst. Die Umstrukturierung der Daten gemäß der Umstrukturierungsanfrage gib T1,M(STID,(NAME,SEX,LOCATION,FAC)?,M(MARK,COURSE),M(HOBBY)) dauerte mit einem Verfahren gemäß dem Stand der Technik 1,12s, wohingegen sie mit dem erfindungsgemäßen Verfahren nur 0,25 s dauerte.In a fourth comparison measurement, data was restructured consisting of three files, the first file comprising 100 flat tuples, the second file 1,700 flat tuples and the third file 600 flat tuples. Restructuring of the data according to the restructuring request gave T1, M (STID, (NAME, SEX, LOCATION, FAC), M (MARK, COURSE), M (HOBBY)) lasted by a prior art method 1.12s, whereas it lasted only 0.25 seconds with the method according to the invention.

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

DE 10157996 B4 [0002]
DE 10047338 C2 [0003]
DE 19743266 C1 [0004]
DE 19743267 C1 [0005]

Zitierte Nicht-PatentliteraturCited non-patent literature

Benecke, K. and Li, X .: "A Restructuring Operation for XML Documents" in Technical Report, Otto von Guericke University Magdeburg, Faculty of Computer Science, No. FIN-009-2009, May 19, 2009 [0006]

Claims

A method for technically feasible restructuring of multi-component tuples according to a restructuring request by which contents of the j-th components of the tuples are to be subordinate to contents of the i-th components; comprising the following steps: Storing the contents of the i-th components of the tuples in a memory as nodes of a first tree to be constructed, wherein like contents are represented by one of the nodes of the first tree; - Store the contents of the jth components of the tuple in the memory as nodes of further trees to be built, the jth components of those tuples are stored with a same content i-th component in each one of the other trees, with that node of the first tree which represents the content of the respective i-th component; and Output of the data structure formed from the several associated trees.

A method according to claim 1, characterized in that the depositing of the contents of the ith components of the tuples and depositing the contents of the jth components of the tuples comprises the following substeps: Reading in a first tuple of the data and dropping the content of the ith component of the first tuple in a first node of the first tree to be constructed, wherein the content of the jth component of the first tuple is to be established in a first node of the first node of the first tuple first tree associated second tree is stored; - Importing a second tuple of the data and storing the content of the ith component of the second tuple in a second node of the first tree and the content of the jth component of the second tuple in a first node to be established with the second node of the first tree or the content of the jth component of the second tuple in a second node of the second tree, if the content of the ith component differs from the content of the first node of the first tree, if the content of the ith Component of the second tuple equals the content of the first node of the first tree; and - Importing a further tuple of the data and storing the content of the ith component of the further tuple in another node of the first tree and the content of the jth component of the further tuple in a first node to be built with the other node of the first tree associated further tree when the content of the i-th component is different from the contents of the nodes of the first tree, or dropping the content of the j-th component of the further tuple as another node of the tree which associates with that node of the first tree whose content is equal to the content of the ith component of the further tuple; and - repeating the aforementioned sub-step until all data is stored in the memory.

A method according to claim 1 or 2, characterized in that the depositing of the contents of the i-th components of the tuple in the nodes of the first tree and the deposition of the contents of the j-th components of the tuple in the nodes of the other trees is such that the trees are constructed as balanced trees.

Method according to claim 3, which is dependent on claim 2, characterized in that: The content of the ith component of the read-in tuple is stored in a further node of the first tree in each case by appending this further node to a lower end of the first tree, if the first tree thereby remains balanced, or otherwise this additional node becomes the parent of the first tree; and - The deposition of the content of the j-th component of the read tuple in another node of the respective tree is carried out in each case that this additional node is attached to a lower end of the respective tree, if the respective tree remains balanced, or otherwise, that this additional node is superordinated to the already existing node of the respective tree.

Method according to one of Claims 1 to 4, characterized in that k-th components of the tuples are each formed by a list of data, wherein, according to the restructuring request, contents of the lists of the k-th components are subordinate to the contents of the j-th components , for which the following step is performed before the remaining steps: - Forming modified tuples for each date of the lists of the kth components, the modified tuples in each case the respective date from the respective list and the ith component and the jth Component of the tuple comprising the respective list; the remaining steps of the method being performed for the modified tuples.

Method according to one of claims 1 to 5, characterized in that it is further adapted to the restructuring of further arranged in tuples with multiple components data, wherein according to the restructuring request further contents of the mth components of the further tuple contents of the lth components of subordinate to further tuples are; further comprising the following steps: - depositing the contents of the l-th components of the further tuples in nodes of the first tree to be set up, wherein identical contents are represented by in each case one of the nodes of the first tree; and Depositing the contents of the m-th components of the further tuples in nodes of the further trees to be built, wherein the m-th components of those other tuples are deposited with a content-same l-th component in each one of the further trees, with that node of the first Baumes, which represents the content of the respective l-th component.

Method according to one of claims 1 to 6, characterized in that the i-th components of the tuple and / or the j-th components of the tuple are formed by fields of tuples.

Method according to one of Claims 1 to 7, characterized in that the i-th components of the tuples are formed by u-th fields within rth sub-tuples of the tuples.

Method according to one of claims 1 to 8, characterized in that the j-th components of the tuples are formed by v-th fields within s-th sub-tuples of the tuples.

Method according to one of claims 1 to 9, characterized in that the data comprise at least 1,000 of the tuples.

Computer program with program code means for performing all the steps of the method according to one of claims 1 to 10 during the execution of the program on a computer.

Computer program with program code means according to claim 11, which are stored on a computer-readable medium.