DE10156379A1

DE10156379A1 - Method for automatic databanks administration, polling by users of databanks is detected online

Info

Publication number: DE10156379A1
Application number: DE10156379A
Authority: DE
Inventors: Karin Haese; Ludger Seiling
Original assignee: Mummert & Partner Unternehmens
Current assignee: Mummert & Partner Unternehmens
Priority date: 2001-11-16
Filing date: 2001-11-16
Publication date: 2003-06-05

Abstract

A method for automatic administration of a first databank (1), in which the latter contains at least a second databank (2,3) and in which the data in the first databank (1) can be modeled according to at least two given attributes (Fi;Dj) in a given dimensional circuit. Polling of the data by users in the first databank (1) and/or the at least one second databank (2,3) is automatically detected and evaluated and the first databank is automatically changed, where the given dimensional circuit is matched to the user polling according to given rules. The polling by data users in the first databank (1) and/or the second databank (2,3) is detected online. Independent claims are given for the following: (1) A device for carrying out automatic administration procedure. (2) A computer program for carrying out automatic administration procedure. (3) A data carrier for a computer program. (4) A data bank.

Description

Die vorliegende Erfindung betrifft ein Verfahren und eine Vorrichtung zum automatischen Administrieren einer Datenbank sowie von Datenbanksystemen, insbesondere Data Marts, also Datenbanksystemen im Front-End-Bereich von Data Warehouse-Datenbanksystemen, und auch Data Warehouses selbst. The present invention relates to a method and an apparatus for automatic administration of a database and database systems, especially data marts, i.e. database systems in the front-end area of Data warehouse database systems, and also data warehouses themselves.

Im folgenden wird unter dem Begriff Data Warehouse eine von den operationalen DV-Systemen isolierte Datenbank verstanden, die als unternehmensweite Datenbasis für alle Ausprägungen managementunterstützender Systeme dient und durch eine strikte Trennung von operationalen und entscheidungsunterstützenden Daten und Systemen gekennzeichnet ist. Hiervon zu unterscheiden sind Datenbanksysteme, welche für programmgesteuerte definierte Transaktionen von Daten entwickelt werden, die sogenannten operativen OLTP- (Online Transaction Processing) oder sonstige Datenbanksysteme, die individuell erstellt werden und für das operative Geschäft benutzt werden (sog. Legacy-Systeme). Eine Übersicht über die verschiedenen Typen von Datenbanken und die jeweils typische Art der Datenorganisation ist - im Vorgriff auf die Figurenbeschreibung - in den Fig. 1 und 2 gegeben. In the following, the term data warehouse is understood to mean a database isolated from operational IT systems, which serves as a company-wide database for all types of management support systems and is characterized by a strict separation of operational and decision support data and systems. A distinction is to be made between database systems which are developed for program-controlled defined transactions of data, the so-called operative OLTP (online transaction processing) or other database systems which are individually created and used for operational business (so-called legacy systems). An overview of the different types of databases and the typical type of data organization is given in anticipation of the description of the figures in FIGS. 1 and 2.

Im Data Warehouse (DWH) eines Unternehmens werden Daten aus verschiedensten operativen Datenbanksystemen zusammengeführt und integriert. Die Daten sind physikalisch im Data Warehouse nach einem weitgehend redundanzfrei modellierten logischen Schema gespeichert (sog. normalisiertes Schema). Das logische Datenbankschema beschreibt die Datenstruktur unabhängig von den Hard- und Softwarekomponenten des Gesamtsystems. Aus dem logischen Schema leitet sich das physikalische Schema nach einem Automatismus für die Hard- und Softwareumgebung ab. Logische Schemata können nach verschiedenen Kriterien modelliert werden. Für das Data Warehouse wird die redundanzfreie Modellierung gewählt, um den benötigten Plattenspeicher gering zu halten. Diese Redundanzfreiheit der Datenspeicherung bringt es mit sich, dass Nutzerzugriffe auf die Daten, bei denen verschiedene Datensätze miteinander in Beziehung zu setzen (also miteinander zu verknüpfen und abzufragen) sind - etwa die Summe der Umsätze eines Produkts über einen bestimmten Zeitraum - verhältnismäßig viel Zeit benötigen. Deshalb schaltet man dem Data Warehouse ein oder mehrere Data Marts vor, welche die Daten aus dem Data Warehouse (bzw. Teile davon) in einer für den Nutzer günstigeren Form gespeichert vorhalten. Der (normale) Nutzer kommuniziert dann ausschließlich mit den Data Marts, nicht aber mit dem Data Warehouse. Allein einer vorbestimmten Gruppe von Experten wird der Zugriff auf das Data Warehouse vorbehalten. A company's data warehouse (DWH) stores data various operational database systems merged and integrated. The data is physically largely in the data warehouse after one Logical scheme modeled without redundancy is stored (so-called normalized Scheme). The logical database schema describes the data structure regardless of the hardware and software components of the overall system. Out According to the logical scheme, the physical scheme is based on a Automatism for the hardware and software environment. Logical schemes can be modeled according to various criteria. For the data warehouse the redundancy-free modeling is chosen to the required disk space to keep low. This freedom from redundancy in data storage is part of the package that user accesses the data where different records to relate to each other (i.e. to link and are to be queried) - for example the sum of the sales of a product over one certain period - take a relatively long time. That's why you switch the data warehouse one or more data marts, which the data from the Data warehouse (or parts of it) in a form that is more convenient for the user hold saved. The (normal) user then only communicates with the data marts, but not with the data warehouse. One alone predetermined group of experts will have access to the data warehouse Reserved.

Die Daten werden also aus dem Data Warehouse gemäß den Anforderungen der Nutzer in einen oder mehrere Data Marts geladen. Data Marts sind somit i. A. (aber nicht notwendigerweise) kleinere Datenbank-Einheiten als das Data Warehouse. Die Data Marts werden mit dem Ziel modelliert, schnelle und auf spezifische Anwenderbedürfnisse zugeschnittene Abfrage-Ergebnisse zu ermöglichen. Ein Data Mart besitzt im Regelfall die logische Struktur eines dimensionalen Modells in Form eines Star-Schemas (vgl. Fig. 6a) oder Snowflake-Schemas (vgl. Fig. 6b). Mittels dieser logischen Schemata werden Daten, welche Auswertungsgegenstände (Fakten) beschreiben (z. B. Umsatz, verkaufte Einheiten etc.), in Beziehung gesetzt zu Daten, welche Auswertungsperspektiven (Dimension) beschreiben (z. B. Geographie, Zeit, Produkte etc.), und zwar derart, dass Fakten in der Kerntabelle des Schemas und Dimensionen in den umliegenden Tabellen gruppiert und sternförmig in Beziehung gesetzt werden, vgl. Fig. 6a. Das Snowflake-Schema entsteht aus dem Star-Schema durch eine Normalisierung derjenigen Dimensionstabellen, welche hierarchisch zusammenhängende Dimensionsattribute enthalten, vgl. Fig. 6b. Das dimensionale Datenmodell, also sowohl das Star- sowie auch das Snowflake- Schema, enthält eine logische Beschreibung der Struktur von Attributen A_i - hierbei kann A_i ein Faktatiribut F_i oder eine Dimensionsattribut D_i sein - in Form von Tabellen und deren Beziehungen (durch Verbindungen dargestellt). Durch diese Art der Datenstruktur wird die Effizienz bei der Abfrage der Daten durch die Nutzer erhöht, z. B. wird die Zugriffszeit bei der Abfrage der Fakten über die direkt mit ihnen verknüpften Dimensionen beträchtlich verringert. Gerade die Effizienz der Datenabfrage ist ein entscheidendes Kriterium bei der Benutzung von Datenbanken vom Typ der Data Marts, gekennzeichnet insbesondere durch eine Vielzahl von Anfragen mit Aggregationen durch die Nutzer. The data is therefore loaded from the data warehouse into one or more data marts according to user requirements. Data marts are therefore i. A. (but not necessarily) smaller database units than the data warehouse. The data marts are modeled with the aim of enabling fast query results tailored to specific user needs. As a rule, a data mart has the logical structure of a dimensional model in the form of a star schema (cf. FIG. 6a) or snowflake schema (cf. FIG. 6b). Using these logical schemes, data describing objects of evaluation (facts) (e.g. sales, units sold etc.) are related to data describing perspectives (dimension) (e.g. geography, time, products etc. ) in such a way that facts in the core table of the scheme and dimensions in the surrounding tables are grouped and put into a star relationship, cf. Fig. 6a. The snowflake schema is created from the star schema by normalizing those dimension tables that contain hierarchically related dimension attributes, cf. Fig. 6b. The dimensional data model, i.e. both the star and the snowflake schema, contains a logical description of the structure of attributes A_i - here A_i can be a factoring attribute F_i or a dimension attribute D_i - in the form of tables and their relationships (represented by connections) , This type of data structure increases the efficiency in querying the data by the user, e.g. For example, the access time when querying the facts about the dimensions directly associated with them is significantly reduced. The efficiency of the data query is a decisive criterion when using databases of the data marts type, characterized in particular by a large number of queries with aggregations by the users.

Jedoch können sowohl die grundsätzliche Zusammensetzung (das "Was", also die Attribute) als auch die interne Struktur (das "Wie", also die Zusammenfassung der Attribute zu Tabellen und Verknüpfungen zu Tabellen) des Schemas eines Data Marts aufgrund von geänderten Nutzeranforderungen im Laufe der Zeit variieren. Die Beladung eines Data Marts aus dem Data Warehouse erfolgt üblicherweise in periodischen Abständen und wird ggf. von einer Änderung des Schemas begleitet. However, both the basic composition (the "what", i.e. the Attributes) as well as the internal structure (the "how", ie the summary of the Attributes to tables and links to tables) of the schema of a data Marts may vary over time due to changing user requirements. A data mart from the data warehouse is usually loaded in periodic intervals and may be accompanied by a change in the scheme.

Die Anpassung des Schemas in einem Data Mart an geänderte Nutzeranforderungen erfolgt im Stand der Technik durch manuelle Analyse auf der fachlichen Ebene, manuelle Einarbeitung in das Datenverarbeitungskonzept sowie manuelle Pflege und manuelle Einarbeitung der Änderungen in das logische und physikalische dimensionale Datenmodell. The adjustment of the schema in a data mart to changed In the prior art, user requirements are based on manual analysis the technical level, manual training in the data processing concept as well as manual maintenance and manual incorporation of changes into the logical and physical dimensional data model.

Aufgabe der Erfindung ist daher, ein Verfahren und eine Vorrichtung zum automatisierten Administrieren einer Datenbank (DB) zu schaffen, die jeweils Änderungen des Nutzerverhaltens der Datenbank berücksichtigen. Diese Aufgabe wird durch ein Verfahren bzw. eine Vorrichtung gemäß den unabhängigen Ansprüchen gelöst. Vorteilhafte Weiterbildungen sind in den abhängigen Ansprüchen definiert. The object of the invention is therefore a method and an apparatus for automated administration of a database (DB) to create each Consider changes in the user behavior of the database. This task is by a method or an apparatus according to the independent Claims solved. Advantageous further developments are in the dependent Defined claims.

Erfindungsgemäß bereitgestellt wird also ein Verfahren zum automatischen Administrieren einer ersten Datenbank, wobei die erste Datenbank Daten aus mindestens einer zweiten Datenbank vorhält, wobei die Daten in der ersten Datenbank gemäß mindestens zwei vorbestimmten Attributen in einem vorbestimmten dimensionalen Schema modellierbar sind, die mindestens zwei Attribute Fakten und Dimensionen darstellen können und das Schema eine sternförmige Verknüpfung von jeweils einem Faktattribut im Zentrum und mindestens einem Dimensionsattribut um das Zentrum herum darstellt, wobei die Abfragen von Nutzern von den Daten in der ersten Datenbank und/oder der mindestens einen zweiten Datenbank automatisch erfasst und ausgewertet werden und die erste Datenbank automatisch verändert wird, indem das vorbestimmte dimensionale Schema auf die Abfragen der Nutzer gemäß vorbestimmten Regeln angepasst wird. According to the invention, a method for automatic is provided Administering a first database, the first database being data maintains at least one second database, the data in the first Database according to at least two predetermined attributes in one predetermined dimensional scheme can be modeled, the at least two Attributes can represent facts and dimensions and the scheme one star-shaped linking of one fact attribute in the center and represents at least one dimension attribute around the center, where the Querying users of the data in the first database and / or the at least one second database can be automatically recorded and evaluated and the first database is automatically changed by the predetermined one dimensional scheme on user queries according to predetermined rules is adjusted.

Die Erfindung erlaubt demnach eine automatische Anpassung eines initialen dimensionalen Datenmodells (also des dimensionalen Schemas). Die automatische Anpassung wird angestoßen durch ein geändertes Nutzerverhalten, etwa wenn bisherige Standardanfragen nicht mehr gestellt und neue Anfragen vermehrt gestellt werden. The invention accordingly allows an automatic adjustment of an initial one dimensional data model (i.e. the dimensional scheme). The automatic adaptation is triggered by a change in user behavior, For example, when previous standard queries are no longer made and new queries be put more often.

Eine besonders schnelle Anpassung ist erzielbar, wenn die Abfragen von Nutzern von den Daten in der ersten Datenbank und/oder der mindestens einen zweiten Datenbank online erfasst werden. A particularly quick adjustment can be achieved if the user queries of the data in the first database and / or the at least one second Database can be entered online.

Ferner werden die Abfragen gemäß einem Standard gestellt. Hierdurch wird das Verfahren kompatibel mit bekannten Datenbanksystemen bzw. Abfragemitteln. Insbesondere ermöglicht dies auch die leichte Identifizierung von Attributen in den Abfragen und/oder in Folgen von Abfragen. In addition, the queries are made according to a standard. This will Process compatible with known database systems or query means. In particular, this also enables the easy identification of attributes in the queries and / or in the sequence of queries.

Attribute der ersten Datenbank können sowohl Attribute selbst als auch Ausdrücke von Attributen sein, d. h. dass sie auch durch Transformationen und/ oder operativen Verknüpfungen aus Attributen der ersten Datenbank und/oder der mindestens einen zweiten Datenbank gebildet werden Die Auftrittswahrscheinlichkeiten von Attributen und ihren Kombinationen in einer Abfrage und/oder Folgen von Abfragen werden für den automatischen Entwurf bzw. die Anpassung eines dimensionalen Schemas so ausgewertet, dass eine Optimierung des Schemas für kurze Abfragezeiten durchgeführt werden kann. Dafür wird die Protokollierung der Abfragen sowohl in der ersten als auch in der mindestens einen zweiten Datenbank ausgenutzt. Vorteilhaft ist es und daher allgemein üblich, die Abfragen auf die mindestens eine zweite Datenbank einer vorbestimmten Gruppe von Nutzern (Experten) vorzubehalten, da die Datenbank i. A. nicht optimiert ist hinsichtlich der Zugriffszeit. Attributes of the first database can be both attributes themselves and Expressions of attributes, i.e. H. that through transformations and / or operational links from attributes of the first database and / or the at least one second database are formed Probabilities of occurrence of attributes and their combinations in one Query and / or sequence of queries are used for automatic design or the adaptation of a dimensional scheme evaluated so that a Optimization of the scheme can be carried out for short query times. For this, the logging of the queries in both the first and in the exploited at least a second database. It is advantageous and therefore common practice, the queries on the at least one second database predetermined group of users (experts) because the database i. A. is not optimized in terms of access time.

Entsprechend dem Abfrageverhalten von Nutzern, welches nach intelligenter (z. B. auch regelbasierter) Auswertung der Protokolle von standardisierten Nutzer- Datenbankabfragen bekannt ist, wird das Schema verändert. Änderungen in dem logischen Schema können physikalisch Änderungen in Foreign-Keys und/oder Änderungen von Spalten in der ersten Datenbank bedeuten. Damit steuert die Erfindung das Datenbankmanagementsystem (DBMS) dahingehend, Attributtabellen abfrageoptimiert anzulegen. Das DBMS wiederum steuert das Speichern der Tabellenstruktur und der Daten auf den Festplatten des Rechners. According to the query behavior of users, which according to intelligent (e.g. also rule-based) evaluation of the logs of standardized user If database queries are known, the schema is changed. Changes in the logical scheme can physically changes in foreign keys and / or Changes to columns in the first database mean. This controls the Invention the database management system (DBMS) in that Create attribute tables in a query-optimized manner. The DBMS controls this Saving the table structure and the data on the hard disks of the computer.

Dieser Prozess der Datenmodellanpassung ist in Fig. 3 dargestellt. In Fig. 4 ist das herkömmliche Verfahren demjenigen der Erfindung gegenübergestellt. This process of data model adaptation is shown in FIG. 3. In FIG. 4, the conventional method is that of the invention compared.

Die Erfindung umfasst auch ein oder mehrere Computerprogramme zur Durchführung des oben beschriebenen Verfahrens sowie computerlesbare Datenträger für das oder die Computerprogramme. The invention also includes one or more computer programs for Implementation of the method described above and computer-readable Data carriers for the computer program or programs.

Die Erfindung umfasst auch mindestens eine erste Datenbank, insbesondere vom Typ eines Data Mart, und möglicherweise weitere Datenbanken vom Typ eines Data Warehouse oder eines Legacy Systems, aufweisend:

a) eine Schnittstelle für Nutzeranfragen in einer vorbestimmten Abfragesprache;
b) ein laufendes Datenbankmanagementsystem zum Ausführen von Nutzeranfragen und Protokollieren der Anfragen;
c) Metadaten, welche jedem Attribut in der ersten Datenbank die Eigenschaft Fakt bzw. Dimension zuweisen.

The invention also comprises at least a first database, in particular of the type of a data mart, and possibly further databases of the type of a data warehouse or a legacy system, comprising:

a) an interface for user inquiries in a predetermined query language;
b) an ongoing database management system for executing user requests and logging requests;
c) Metadata, which assign the attribute fact or dimension to each attribute in the first database.

Gemäß der Erfindung entfällt die manuelle Pflege und Änderung des Datenmodells und der Datenhaltung im Data Mart, die durch veränderte Nutzeranforderungen entstehen. Das dimensionale Datenmodell und die Datenhaltung des Data Marts werden automatisch an das Nutzerverhalten angepasst. Änderungen im Nutzerverhalten werden vor allem durch die Veränderung des Datenanalyseschwerpunkts hervorgerufen. Analyseschwerpunkte ändern sich z. B. auch, wenn neue betriebswirtschaftliche Erkenntnisse und Methoden, Marktmodelle oder -strukturen vorliegen, wie sie sich durch den Einsatz von neuen Technologien, z. B. E-Commerce ergeben. According to the invention, manual maintenance and modification of the Data model and data management in the data mart, which is changed by User requirements arise. The dimensional data model and the Data storage of the data marts are automatically based on user behavior customized. Changes in user behavior are mainly caused by the Change in the focus of data analysis. Main focuses of analysis change e.g. B. also when new business Knowledge and methods, market models or structures are available as they are through the use of new technologies, e.g. B. result in e-commerce.

Unter sich veränderndem Analyseschwerpunkt ändern sich die Informationsanfragen an den Data Mart, d. h. bisherige Standardanfragen werden nicht mehr gestellt und neue Anfragen werden vermehrt gestellt. Schnelle Abfrageergebnisse werden also durch Anpassung der Datenhaltung an die geänderten Analyseziele erreicht. Erfindungsgemäß werden daher Änderungen in dem Schema entsprechend Änderungen von Analyseschwerpunkten der Nutzer durchgeführt. As the focus of analysis changes, the Requests for information to the data mart, d. H. previous standard requests no longer asked and new inquiries are increasingly asked. speed Query results are thus adapted to the data management changed analysis goals achieved. According to the invention, changes in according to the schema changes in the focus of analysis of the users carried out.

Die Erfindung wird anhand von Beispielen zusammen mit der Zeichnung näher beschrieben. In der Zeichnung zeigen The invention is illustrated by examples together with the drawing described. Show in the drawing

Fig. 1 ein Data Warehouse Datenbank-System; FIG. 1 is a data warehouse database system;

Fig. 2 eine Klassifikation von Datenbanken und ihren Datenmodellen; Figure 2 is a classification of data bases and their data models.

Fig. 3 ein Prozessmodell der automatisierten Datenmodellanpassung; Fig. 3 is a process model of the automated data model adaptation;

Fig. 4 eine Gegenüberstellung des bekannten Vorgehens und des Verfahrens gemäß der Erfindung; Fig. 4 is a comparison of the prior art approach and the method according to the invention;

Fig. 5 die verschiedenen Ebenen eines Datenbank-Systems, in welchem die Erfindung implementiert werden kann; 5, the different levels of a database system can be implemented in which the invention Fig.

Fig. 6a ein Star-Schema; FIG. 6a, a star schema;

Fig. 6b ein Snowflake-Schema; FIG. 6b is a snowflake schema;

Fig. 7 Nutzer und Zugriffsmöglichkeiten auf Daten der Ebene 1 und 2; Fig. 7 users and access to data at levels 1 and 2 ;

Fig. 8 ein Nassi-Schneiderman-Ablaufdiagramm für die Generierung der Fakt- /Dimensionsattribut-Verbindungen; Fig. 8 is a Nassi-Schneiderman-flow diagram for the generation of the Fakt- / Dimension attribute compounds;

Fig. 9 gelernte Beziehungen zwischen Fakt- und Dimensionsattributen; Fig. 9 learned relations between fact and dimension attributes;

Fig. 10 ein Nassi-Schneiderman-Ablaufdiagramm des Algorithmus zur Zusammenfassung von Dimensionsattributen zu Dimensionstabellen; und FIG. 10 is a Nassi-Schneiderman flow diagram of the algorithm for combining dimension attributes to dimension tables; and

Fig. 11 eine Zusammenfassung von Fakt-/Dimensionsattributen zu Fakt- /Dimensionstabellen mittels Verfahren der Künstlichen Intelligenz. Fig. 11 is a summary of Fakt- / dimension attributes to Fakt- / dimension tables by methods of artificial intelligence.

Fig. 1 bis 4 sowie auch Fig. 6a und 6b wurden bereits oben erläutert. Fig. 5 zeigt die verschiedenen Ebenen eines Datenbank-Systems, in welchem die Erfindung implementiert werden kann. In der Ebene 1 befindet sich mindestens eine Datenbank DBA, möglich sind jedoch N Datenbanken, DBA bis DB N. Dies können ein oder mehrere Data Warehouses oder Legacy-Datenbanken sein. Das bedeutet, dass mindestens ein Datenbank-System in der Ebene 1 besteht und produktiv ist. Insbesondere läuft in der Datenbank eine Software zum Management des Datenbanksystems (DBMS). Während die Daten in der Ebene 1 üblicherweise redundanzfrei oder transaktionsorientiert bereits abgelegt sind, werden diejenigen Daten, welche in Ebene 2 gemäß den Prinzipien der dimensionalen bzw. OLAP-(Online Analytical Processing)Datenmodellierung gespeichert werden sollen, durch das Verfahren gemäß der Erfindung erst bestimmt. Das zugehörige dimensionale Schema (OLAP-Schema) wird ebenfalls durch die Erfindung entwickelt. Fig. 1 to 4 as well as Fig. 6a and 6b have already been explained above. Fig. 5 shows the different levels of a database system in which the invention may be implemented. There is at least one database DBA in level 1 , but N databases, DBA to DB N are possible. These can be one or more data warehouses or legacy databases. This means that at least one database system exists in level 1 and is productive. In particular, database management software (DBMS) runs in the database. While the data in level 1 are usually already stored redundancy-free or transaction-oriented, the data that are to be stored in level 2 according to the principles of dimensional or OLAP (online analytical processing) data modeling are only determined by the method according to the invention , The associated dimensional scheme (OLAP scheme) is also developed by the invention.

Zur Generierung des Schemas werden die standardisierten Abfragen (z. B. standard query language, SQL) der Nutzer auf die Datenbanken der Ebenen 1 und 2 gemäß Fig. 5 protokolliert und analysiert. To generate the scheme, the standardized queries (eg standard query language, SQL) of the users are logged and analyzed on the databases of levels 1 and 2 according to FIG. 5.

Einigen Nutzergruppen, den sogenannten Experten, wird - in Weiterbildung der Erfindung - neben dem Einblick und Zugriff auf die Daten in Ebene 2 (also dem Data Mart) der Einblick und Zugriff auf alle Daten in Ebene 1 (also dem Data Warehouse oder OLTP-System) zur Verfügung gestellt. Dies ist in Fig. 7 dargestellt. Dadurch erhalten Experten die Möglichkeit, sich im gesamten Unternehmensdatenraum uneingeschränkt zu bewegen, wenn die Datenmodellierung der Ebene 2 neuen Anforderungen durch innovative Fragestellungen nicht mehr genügt. Some user groups, the so-called experts, - in a further development of the invention - in addition to the insight and access to the data in level 2 (i.e. the data mart), the insight and access to all data in level 1 (i.e. the data warehouse or OLTP system) ) made available. This is shown in FIG. 7. This gives experts the opportunity to move around the entire company data room without restriction if level 2 data modeling no longer meets new requirements due to innovative questions.

Die Identifikation von Änderungen im Nutzerverhalten ist durch die Standardisierung der Abfrage-Sprache, wie sie z. B. in SQL vorliegt, möglich. Zwar gibt es mehrere Varianten der Abfrage-Sprache SQL. Ohne Beschränkung der Allgemeinheit, zum Zweck der Erläuterung des Verfahrens, wird das Verfahren hier lediglich am Beispiel des urspünglichsten SQL-Standards beschrieben. The identification of changes in user behavior is through the Standardization of the query language, such as B. is in SQL, possible. There are several variants of the query language SQL. Without limitation the general public, for the purpose of explaining the process, will The procedure here is only based on the example of the most original SQL standard described.

Eine SQL-Anfrage kann gemäß dem Standard 6 Sätze beinhalten, wobei nur die ersten zwei zwingend erforderlich sind. Diese Sätze werden im folgenden aufgelistet. Eingeklammert sind die optionalen Sätze:

1. SELECT <attribute and functional list>
2. FROM <table list>
3. [WHERE <condition>]
4. [GROUP BY <grouping attribute(s)>]
5. [HAVING <group condition>]
6. [ORDER BY <attribute list>]

According to the standard, an SQL query can contain 6 sentences, whereby only the first two are mandatory. These sentences are listed below. The optional sentences are bracketed:

1. SELECT <attribute and functional list>
2. FROM <table list>
3. [WHERE <condition>]
4. [GROUP BY <grouping attribute (s)>]
5. [HAVING <group condition>]
6. [ORDER BY <attribute list>]

Diese SQL-Anfragen werden in einer Log-Datei, z. B. mit dem Dateinamen "system.log", protokolliert. Der Standard der SQL ermöglicht es, die Attribute in (1.), (4.) und (6.) mittels Stringvergleich zu identifizieren und zu extrahieren. Die Attribute in (1.) können Fakt- oder Dimensionselemente werden, während die anderen Sätze hauptsächlich Dimensionselemente enthalten. Die Klassifikation von Attributen in Faktelemente F_i bzw. Dimensionselemente D_i wird vorausgesetzt. These SQL queries are stored in a log file, e.g. B. with the file name "system.log", logged. The standard of SQL allows the attributes in Identify (1.), (4.) and (6.) by means of string comparison and extract them. The Attributes in (1.) can become fact or dimension elements while the other sets mainly contain dimension elements. The classification of attributes in fact elements F_i or dimension elements D_i provided.

Gemäß der Erfindung werden die Protokolle aus der Log-Datei genutzt, um Auswertungen über die Häufigkeit von Abfragen zu machen. Von den Nutzern häufig abgefragte Kombinationen "Fakt-Dimension" werden ausgewählt, um im dimensionalen Datenmodell der Ebene 2 als Star oder Snowflake implementiert zu werden. According to the invention, the logs from the log file are used to make evaluations of the frequency of queries. Combinations of "fact-dimension" that are frequently requested by users are selected in order to be implemented in the dimensional data model of level 2 as a star or snowflake.

Die Fakt-Dimensions-Kombinationen, die zur Verarbeitung der Ad-Hoc-Abfragen am effizientesten sind, werden gemäß einem Aspekt der Erfindung durch Lernen der Attribut-Kombinationen "Fakt-Dimension", hergestellt, die in den SQL- Abfragen auftreten. Das Lernen der Attribut-Kombinationen geschieht erfindungsgemäß durch folgendes Verfahren, welches als Ablaufdiagramm in Fig. 8 dargestellt ist. Betrachtet werden die Mengen F und D der Faktattribute F_i und der Dimensionsattribute D_j. i und j numerieren die Attribute in einer willkürlichen aber festen Reihenfolge. Zu Beginn des Lernens werde angenommen, dass keine Beziehungen V_ij zwischen den Faktattributen F_i und den Dimensionsattributen D_j bestehen, d. h. die Verbindungen V_ij mit Null initialisiert werden. Das Lernen geschieht, indem die Attribute in jeder Ad-Hoc- SQL-Abfrage identifiziert werden. Die Beziehungen V_ij werden in Form von Verbindungslinien zwischen den in der Abfrage auftretenden Attributen gesetzt. Gleichzeitig wird das Alter A_V_ij der Verbindungen geführt und für die gerade gesetzte Verbindung V_ij auf den vorbestimmten Wert a gesetzt. Als Einheit für a kann 1 gewählt werden. Im nächsten Schritt werden alle Verbindungen gelöscht, welche älter als eine vorbestimmte Altersgrenze a_max sind. Das führt dazu, dass Verbindungen gelöscht werden, welche nach Verstreichen der Zeitspanne (Altergrenze a_max) nicht mehr benötigt werden, was Abfragen entspricht, die nicht mehr von den Nutzern an den Data Mart gerichtet werden. Bevor mit der nächsten SQL-Abfrage wird in gleicher Weise verfahren wird, werden für alle bestehenden Verbindungen das Alter A_V_ij um eine Einheit a erhöht. The fact-dimension combinations that are most efficient for processing the ad hoc queries are made in accordance with one aspect of the invention by learning the "fact-dimension" attribute combinations that occur in the SQL queries. According to the invention, the attribute combinations are learned by the following method, which is shown as a flow chart in FIG. 8. The sets F and D of the fact attributes F_i and the dimension attributes D_j are considered. i and j number the attributes in an arbitrary but fixed order. At the beginning of the learning it is assumed that there are no relationships V_ij between the fact attributes F_i and the dimension attributes D_j, ie the connections V_ij are initialized with zero. Learning is done by identifying the attributes in each ad hoc SQL query. The relationships V_ij are set in the form of connecting lines between the attributes occurring in the query. At the same time, the age A_V_ij of the connections is carried out and set to the predetermined value a for the connection V_ij which has just been set. 1 can be selected as the unit for a. In the next step, all connections that are older than a predetermined age limit a_max are deleted. This leads to the deletion of connections that are no longer required after the time period (age limit a_max) has elapsed, which corresponds to queries that users no longer send to the data mart. Before proceeding in the same way with the next SQL query, the age A_V_ij is increased by one unit a for all existing connections.

Das iterative Verfahren zur Definition der Relationen (Beziehungen bzw. Verbindungen) zwischen Fakt- und Dimensionsattributen im logischen Schema kann also wie folgt zusammengefasst werden:

- für jede Abfrage SELECT F_i by D_j:
- Identifizieren der Attribute F_i, D_i;
- Definieren einer Verbindung V_ij zwischen dem Fakt-Attribut F_i und dem Dimensions-Attribut D_j und Zuordnen eines vorbestimmten Alterswertes a zu der Verbindung V_ij;
- Für alle übrigen definierten Verbindungen V_xy:
- sofern das Alter der Verbindung V_xy größer als ein vorbestimmter Wert a_max) ist: Löschen der Verbindung V_xy;
- andernfalls: Hochsetzen des Alters der Verbindung V_xy um den vorbestimmten Alterswert a.

The iterative procedure for defining the relations (relationships or connections) between fact and dimension attributes in the logical schema can thus be summarized as follows:

- for each query SELECT F_i by D_j:
- Identify the attributes F_i, D_i;
- Define a connection V_ij between the fact attribute F_i and the dimension attribute D_j and assign a predetermined age value a to the connection V_ij;
- For all other defined connections V_xy:
- if the age of the connection V_xy is greater than a predetermined value a_max): delete the connection V_xy;
otherwise: increasing the age of the connection V_xy by the predetermined age value a.

Mittels des vorgenannten iterativen Verfahrens können auch jeweils zwei aufeinanderfolgende Abfragen betrachtet werden und, sofern sich die beiden Abfragen nur in einem Dimensionsattribut unterscheiden, die beiden Dimensionsattribute D_j, D_k als zu einer Attribut-Tabelle gehörig identifiziert werden. Dies wird in der weiteren Ausgestaltung der Erfindung später anhand von Fig. 10 näher erläutert. Das Ergebnis des Verfahrens sind die Beziehungen zwischen den Attributen im allgemeinen, welche in dem dimensionalen OLAP- Datenmodell berücksichtigt werden müssen, um eine leistungsstarke Datenhaltung zu etablieren. In Fig. 9 ist ein mögliches Ergebnis des Lernvorgangs aus Fig. 8 mit Anwendung auf eine Abfrage dargestellt. Fig. 9 zeigt, dass die Faktattribute F_1 bis F_3 mit einer Auswahl der Dimensionsattribute D_4 bis D_9 in Beziehung stehen. Die Existenz einer Beziehung ist durch eine Verbindungslinie V_ij dargestellt. Die Kombination "Faktattribute-Dimensionsattribute" ist nicht vollständig, da einige Fakten nur über bestimmten Dimensionsattributen von den Experten analysiert wurden. Somit ist das System in der Lage, eigenständig, also ohne Eingriff durch den Menschen, eine Adaptierung auf geänderte Nutzeranforderungen durchzuführen. By means of the above-mentioned iterative method, two successive queries can also be considered and, if the two queries differ only in one dimension attribute, the two dimension attributes D_j, D_k can be identified as belonging to an attribute table. This is explained in more detail later in the further embodiment of the invention with reference to FIG. 10. The result of the method is the relationships between the attributes in general, which must be taken into account in the dimensional OLAP data model in order to establish powerful data management. FIG. 9 shows a possible result of the learning process from FIG. 8 when applied to a query. Fig. Shows that the fact attributes F_1 are to F_3 with a selection of dimension attributes to D_4 D_9 in relationship. 9 The existence of a relationship is represented by a connecting line V_ij. The combination of "fact attributes-dimension attributes" is not complete, because some facts were only analyzed by the experts about certain dimension attributes. The system is therefore able to adapt to changed user requirements independently, i.e. without human intervention.

Neben dem hier beschriebenen Verfahren, um Beziehungen zwischen Fakt- und Dimensionsattributen adaptiv zu lernen, gibt es auch andere Möglichkeiten, wie diese Beziehungen gefunden werden können, vgl. etwa Roweis, Saul: "Nonlinear Dimensionality Reduction by Locally Nonlinear Embedding"; Tenenbaum, Silviar, Langford: "A Global Geometric Framework for Nonlinear Dimensionality Reduction". In addition to the procedure described here to establish relationships between factual and To learn dimension attributes adaptively, there are other ways such as these relationships can be found, cf. about Roweis, Saul: "Nonlinear Dimensionality Reduction by Locally Nonlinear Embedding "; Tenenbaum, Silviar, Langford: "A Global Geometric Framework for Nonlinear Dimensionality Reduction ".

In weiterer Ausgestaltung der Erfindung kann auch, wie oben genannt, die Zusammenfassung einzelner Attribute zu Tabellen durchgeführt werden. Diese Attributtabellen dienen dazu, jeweils hierarchisch (d. h. die Attribute sind ineinander enthalten, wie Tage im Monat enthalten sind) bzw. auch einfach logisch zusammenhängende Attribute gemeinsam zu speichern. Diese Zusammenfassung der Attribute zu Fakt- und Dimensionstabellen, wird erfindungsgemäß ebenfalls automatisiert und regelbasiert durchgeführt. Die Regeln dienen der Überprüfung und Ergänzung des automatischen Verfahrens. Es wird wie oben angenommen, dass jedem Attribut die Eigenschaft Fakt bzw. Dimension zugewiesen wurde. Diese Zuweisung wird in den Metadaten festgehalten. Alle Faktattribute werden in der Faktentabelle zusammengefasst. Die Zusammenfassung der Dimensionsattribute zu Dimensionstabellen wird nach folgendem automatischen Verfahren durchgeführt. In a further embodiment of the invention, as mentioned above, the Individual attributes can be summarized into tables. This Attribute tables serve to be hierarchical (i.e. the attributes are contained in each other, as are contained in days in the month) or simply save logically related attributes together. This Summary of the attributes to fact and dimension tables According to the invention also carried out automatically and based on rules. The Rules are used to check and supplement the automatic procedure. It it is assumed, as above, that each attribute has the fact or Dimension has been assigned. This assignment is in the metadata recorded. All fact attributes are summarized in the fact table. The Summary of the dimension attributes to dimension tables is after following automatic procedure.

Die Faktattribute F_i sind bereits identifiziert. Dann basiert das automatische Zusammenfassen von Dimensionsattributen zu Dimensionstabellen auf denselben Abfrageprotokollen, die schon zur Erstellung der Beziehungen zwischen Faktattributen F_i und Dimensionsattributen D_j benutzt wurden. Hierfür werden wiederum die Protokolle herangezogen, indem die Folge von zwei Abfragen ausgewertet wird. Es wird identifiziert, ob die spätere Abfrage sich nur in einem Dimensionsattribute von der vorhergehenden Abfrage unterscheidet. Dies ist z. B. bei dem sogenannten Drill Down oder Roll Up der Fall. Drill Down bedeutet in der Fachsprache eine verfeinerte Online-Darstellung des Fakts auf einer niedrigeren Hierarchiestufe der Dimension. Der Roll Up beschreibt den analogen Vorgang in die andere Richtung, d. h. Online-Darstellung des Fakts auf einer höheren Hierarchiestufe. Die Abfragen unterscheiden sich also bei einem Drill Down oder Roll Up nur in einem Dimensionsattribut. Beispiel: Wird ein Faktattribut "Umsatz" in der ersten Abfrage über dem Dimensionsattribut "Monat" aus der Datenbank extrahiert und in der direkt darauffolgenden Abfrage wird Faktattribut "Umsatz" über dem Dimensionsattribut "Tag", so handelt es sich um einen Drill Down. Der Drill Down oder Roll Up kann man interpretieren als Zeichen für den hierarchischen Zusammenhang von den zwei Dimensionsattributen "Monat" und "Tag". Entsprechend werden die hierarchischen und auch die einfach logischen Verbindungen mittels des Algorithmus nach Fig. 10 gewonnen, wenn bei zwei aufeinanderfolgenden SELECT Abfragen nur ein Unterschied hinsichtlich eines Dimensionsattributs festgestellt wird. Dann sind als Eingabe für den Algorithmus genau die Dimensionsattribute D_i und D_j zu verwenden, die in der sonst identischen Abfrage unterschiedlich sind. Das Nassi-Schneiderman-Ablaufdiagramm zu diesem Verfahren ist in Fig. 10 dargestellt. The fact attributes F_i have already been identified. Then the automatic summarization of dimension attributes to dimension tables is based on the same query protocols that were already used to create the relationships between fact attributes F_i and dimension attributes D_j. The logs are used for this by evaluating the sequence of two queries. It is identified whether the later query differs from the previous query in only one dimension attribute. This is e.g. B. the case of the so-called drill down or roll up. In technical terms, drill down means a refined online presentation of the fact at a lower hierarchical level of the dimension. The roll up describes the analog process in the other direction, ie online presentation of the fact at a higher hierarchical level. In the case of a drill down or roll up, the queries differ only in one dimension attribute. Example: If a "Sales" fact attribute is extracted from the database in the first query above the "Month" dimension attribute and in the immediately following query "Sales" fact attribute is extracted above the "Day" dimension attribute, it is a drill down. The drill down or roll up can be interpreted as a sign for the hierarchical connection of the two dimension attributes "month" and "day". Correspondingly, the hierarchical and also the simply logical connections are obtained by means of the algorithm according to FIG. 10 if only one difference in one dimension attribute is found in two successive SELECT queries. Then exactly the dimension attributes D_i and D_j are to be used as input for the algorithm, which are different in the otherwise identical query. The Nassi-Schneiderman flowchart for this method is shown in FIG. 10.

Folgende Anhaltspunkte, welche neben dem in Fig. 10 beschriebenen automatischen Verfahren zur Überprüfung bzw. Optimierung der Gruppierung der Dimensionsattribute in Dimensionstabellen dienen, haben sich im Rahmen der Erfindung als besonders vorteilhaft erwiesen:

1. Eine Dimensionstabelle sollte die Größenordnung von 6 Hierarchieebenen, 1000 Dimensionselementen nicht überschreiten.
2. Hierarchisch zusammenhängende Dimensionsattribute werden durch Interpretation der Metadaten identifiziert, deren Industriestandard eine automatisierte, regelbasierte Verarbeitung ermöglicht, und in einer Dimensionstabelle zusammengefasst.
3. Beschreibende Attribute und die zugehörigen identifizierenden Schlüsselattribute liegen bereits im DWH zusammen in einer Tabelle vor und werden daher auch hier in einer Dimensionstabelle abgespeichert.

The following indications, which in addition to the automatic method described in FIG. 10 serve to check or optimize the grouping of the dimension attributes in dimension tables, have proven to be particularly advantageous in the context of the invention:

1. A dimension table should not exceed the order of 6 hierarchy levels, 1000 dimension elements.
2. Hierarchically related dimension attributes are identified by interpreting the metadata, whose industry standard enables automated, rule-based processing, and summarized in a dimension table.
3. Descriptive attributes and the associated identifying key attributes are already available together in the DWH in a table and are therefore also stored here in a dimension table.

Ein Beispiel einer Zusammenfassung zu Fakt-/Dimensionstabellen ist in Fig. 11 dargestellt. Die Attribute D_5 bis D_6 stehen in einem hierarchischem Zusammenhang, wie z. B. Monat, Tag, Stunde, und werden daher in einer Tabelle zusammen abgelegt. Die Attribute D_8 und D_9 stehen laut Metadaten in einem beschreibenden Zusammenhang. D_8 ist die textuelle Beschreibung der in D_9 ausgedrückten numerischen Darstellung von z. B. Zugehörigkeit von Personen zu einem Autmobil-Club. An example of a summary of fact / dimension tables is shown in FIG. 11. The attributes D_5 to D_6 are hierarchically related, such as. B. month, day, hour, and are therefore stored together in a table. According to metadata, attributes D_8 and D_9 have a descriptive connection. D_8 is the textual description of the numerical representation of z. B. Belonging to an automobile club.

Schließlich generiert das Vorgehen und die Verfahren der Erfindung ein auf Nutzeranfragen optimiertes Schema, welches anschließend in einem Batchlauf je nach Möglichkeiten der Software-Tools vollständig oder durch Differenzenbetrachtung physikalisch in der Ebene 2 implementiert und beladen werden kann. Die Erfindung steuert also ein Datenbankmanagementsystem, Daten abfrageoptimiert auf der Festplatte eines Rechners abzulegen. Finally, the procedure and the methods of the invention generate a schema optimized for user requests, which can then be implemented in a batch run depending on the possibilities of the software tools completely or physically in level 2 or by considering differences. The invention thus controls a database management system to store data in a query-optimized manner on the hard disk of a computer.

Unter Umständen möchten bestimmte Nutzergruppen (z. B. die Experten bzw. Analysten) nicht nur auf reine Attribute des Data Warehouse oder des bestehenden Data Marts zugreifen, sondern Ausdrücke von Attributen, wie z. B. Transformationen dieser Attribute und/oder operative Verknüpfungen zwischen den Attributen analysieren. Um das dimensionale Datenmodell auch mit den gewünschten Ausdrücken von Attributen automatisiert aufzubauen, muss eine Abfragesprache verwendet werden, mit welcher der Experte den Wunsch, Attribute zu transformieren und/oder operativ zu verknüpfen, ausdrücken kann. Das Protokollieren der Abfragen und anschließend das Identifizieren der Attribute bzw. der Attributausdrücke in den Abfragen ermöglicht schließlich die Organisation eines dimensionalen Datenmodells mit transformierten Fakten oder auch Dimensionen. Under certain circumstances, certain user groups (e.g. the experts or Analysts) not only on pure attributes of the data warehouse or the access existing data marts, but expressions of attributes, such as B. Transformations of these attributes and / or operational links between analyze the attributes. To the dimensional data model with the To automatically build the desired expressions of attributes, one must Query language are used, with which the expert wishes To transform attributes and / or to link them operationally. Logging the queries and then identifying the attributes or the attribute expressions in the queries finally enables the Organization of a dimensional data model with transformed facts or dimensions too.

Claims

1. Method for automatic administration of a first database ( 1 ), the first database ( 1 ) holding data from at least one second database ( 2 , 3 ), wherein
the data in the first database ( 1 ) can be modeled according to at least two predetermined attributes (F_i; D_j) in a predetermined dimensional scheme,
which can represent at least two attributes facts (F_i) and dimensions (D_j) and
the scheme represents a star-shaped link (V_ij, Vik) of one fact attribute (F_i) in the center and at least one dimension attribute (D_j, D_k) around the center,
characterized by
that queries by users of the data in the first database ( 1 ) and / or the at least one second database ( 2 , 3 ) are automatically recorded and evaluated and the first database ( 1 ) is automatically changed by the predetermined dimensional scheme on the Queries of the users is adapted according to predetermined rules.

2. The method according to claim 1, characterized in that the queries of users of the data in the first database ( 1 ) and / or the at least one second database ( 2 , 3 ) are recorded online.

3. The method according to claim 1 or 2, characterized in that attributes (F_i; D_j) of the first database ( 1 ) by transformation and operative links of attributes from the first database ( 1 ) and / or the at least one second database ( 2 , 3 ) be formed.

4. The method according to any one of the preceding claims, characterized in that that the queries are standardized in a query language and / or Consequences of standardized queries are, being in the queries and / or Sequences of queries attribute combinations can be identified.

5. The method according to claim 4, characterized in that the attribute Combinations (V_ij) can be identified using an iterative process and, if the attribute combinations (V_ij) have a predetermined time (a_max) are no longer queried by users, are deleted.

6. The method according to any one of claims 3 to 5, characterized in that the attributes according to the combinations in which they are in a Query occur and / or according to sequences of combinations and of both occurrence probabilities are structured, one Logging of database queries is used, which the Information about attribute combinations and functions and / or the Include sequences of attribute combinations.

7. The method according to claim 5 or 6, characterized in that the iterative method for identifying the attribute combinations contains the following steps, wherein the individual attributes (F_i; D_j) can be arbitrarily numbered:

- for each query (SELECT F_i by D_j):
Identifying the attributes (F_i, D_j);
Defining a connection (V_ij) between the fact attribute (F_i) and the dimension attribute (D_j) and assigning a predetermined age value (a) to the connection (V_ij);

- For all other defined connections (V_xy):
if the age of the connection (V_xy) is greater than a predetermined value (a_max): deleting the connection (V_xy);
otherwise: Increase the age of the connection (V_xy) by the predetermined age value (a).

8. The method according to claim 7, characterized in that two each consecutive queries are considered and, provided the two Queries differ only in one dimension attribute, the two Dimension attributes (D_j, D_k) combined into an attribute table become.

9. The method according to any one of the preceding claims, characterized in that that changes in the schema changes in foreign keys and / or Changes to columns in the in the first database mean.

10. The method according to any one of the preceding claims, characterized in that that changes in the schema corresponding to changes in Main focus of analysis of the users.

11. The method according to any one of the preceding claims, characterized in that querying users of data in the at least one second database ( 2 , 3 ) is reserved for a predetermined group of users, in particular experts.

12. Device for performing the method according to one of the preceding Expectations.

13. Computer program for performing the method according to one of the Claims 1 to 11.

14. Data carrier for a computer program according to claim 13.

15. Database ( 1 ; 2 , 3 ), comprising:

a) an interface for user inquiries in a predetermined query language;

b) an ongoing database management system for executing user requests and logging requests;

c) Metadata, which assign the attribute fact (F) or dimension (D) to each attribute.