EP1397757A2 - Dispositif de restructuration de donnees pour hierarchies aplanissantes - Google Patents
Dispositif de restructuration de donnees pour hierarchies aplanissantesInfo
- Publication number
- EP1397757A2 EP1397757A2 EP00993551A EP00993551A EP1397757A2 EP 1397757 A2 EP1397757 A2 EP 1397757A2 EP 00993551 A EP00993551 A EP 00993551A EP 00993551 A EP00993551 A EP 00993551A EP 1397757 A2 EP1397757 A2 EP 1397757A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- hierarchy
- data structure
- hierarchical data
- flattened
- child
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Definitions
- the present invention relates to database processing in general and hierarchical database processing in particular.
- trans actional data is collected as transactions occur and is then periodically aggregated into an analysis form to allow decision makers to glean information from the collected data.
- a manufacturer might collect transactions relating to sales, ordering and production.
- An example of such a transaction is a sales record, indicating an amount of the sale, the salesperson making the sale, the items sold, the office associated with that salesperson, and other related data.
- these transaction records are often aggregated into multi-dimensional tables for analysis, to provide informative views of the organization's data.
- a long list of thousands of sales records is not likely to be as useful to a decision maker as a report aggregating those sales records by geographical region, dates, types of items, salesperson, or other "dimension".
- One approach to dealing with the volume of data created by an organization is to periodically populate an enterprise resource planning (ERP) database with transaction data stored in a multidimensional (MD) database.
- ERP enterprise resource planning
- MD multidimensional
- An analyst or other decision maker could then query the MD database to obtain a "slice" of the data along a particular dimension to glean information about the data.
- an analyst might query the MD database to obtain a report showing the total sales for each state for each month to get a breakdown of the annual U.S. sales of the organization.
- the analyst is requesting a slice of the aggregated data in the MD database along the geographic and time dimensions.
- the dimensions are hierarchical.
- the ERP system needs to store business hierarchies, such as a cost center hierarchy, a cost element hierarchy or an organization reporting hierarchy.
- the ERP system would include a hierarchy to indicate which cost centers belong in which sales region, which field offices are allocated to which cities, along with a hierarchy of cities, states and countries.
- Such hierarchies can be stored in tables within the MD database system as a collection of child-parent nodes.
- the MD data is organized by geography and sales data is aggregated by salesperson.
- the hierarchy indicates, for each salesperson, an associated field office and each field office is associated with a sales region and each sales region is associated with a country. If an analyst queries a transactional database system to provide a report of sales by country, the transactional database system would have to examine each record in the transactional database system's table of sales records, and accumulate the results. In a real-time environment, where the analyst expects a real-time response to the query, such processing is difficult, and often impossible, to achieve in realtime. With an MD database, however, such processing is more easily done, because the data is aggregated.
- the MD database system When the analyst makes the same query of an MD database system, the MD database system only needs to get the aggregated records and total them up according to the dimensions and slice indicated by the query. Thus, in response to a query for sales in a country of interest, the MD database system would query a geographical hierarchy table to identify which sales regions are in the country of interest, query the geographical hierarchy table to identify which field offices are in those sales regions, query the geographical hierarchy table to identify which salespersons are associated with those field offices, then query the MD database for sales by salesperson and total the sales returned.
- This process of issuing multiple queries and summarizing the results takes time and will slow down the response to analyst queries, which can be a problem where the analyst is analyzing data in real-time, issuing queries, reviewing the responses and issuing other queries in response to the responses, in an effort to understand the organization's data.
- improved performance of an MD database querying system would be a great benefit to analysts and organizations with large amounts of data to be analyzed.
- a flattener In order to improve performance of a database querying system where a hierarchical structure is used in generating a response to a query, a flattener generates a horizontal representation or vertical representation of a transitive closure of a hierarchical data structure.
- the flattened representation can be generated in advance of any queries, so that the database querying system can more quickly respond to a query.
- the flattened representations are only generated as often as the hierarchical structure changes and are stored, so that a flattened representation does not need to be generated for each query.
- a flattened representation represents the transitive closure of the hierarchical structure, in that all ancestor-descendant ' pairings are represented in the flattened structure. While the flattened structure typically stores the hierarchy much less efficiently than a standard parent-child hierarchy table, a precalculated flattened structure can improve the response times for responses to queries that require the hierarchical information.
- FIG. 1 is a block diagram of a computer system according to one embodiment of the present invention.
- Fig. 2 illustrates a hierarchy
- Fig. 2(a) is a graph of the hierarchy
- Fig. 2(b) shows a table data structure representing the hierarchy.
- Fig. 3 is an illustration of a vertically flattened representation of the hierarchy shown in Fig. 2.
- Fig. 4 is an illustration of a horizontally flattened representation of the hierarchy shown in Fig. 2.
- Fig. 5 shows a table data structure representing an extended hierarchy, with child attributes stored in parent-child pairing records.
- Fig. 6 shows a table data structure representing an extended hierarchy, with parent attributes and child attributes stored in parent-child pairing records; the table of Fig. 6 also illustrates how parent-child pairings are stored when a child node is a child of more than one parent node.
- Fig. 7 illustrates the conversion of a single node hierarchy into a transitively closed hierarchy table.
- Fig. 8 illustrates a hierarchy.
- Fig. 9 illustrates _ n extended hierarchy;
- Fig. 9(a) shows a table data structure representing the hierarchy;
- Fig. 9(b) shows a table data structure representing the corresponding vertically flattened representation of the hierarchy.
- Fig. 10 illustrates another extended hierarchy
- Fig. 10(a) shows a table data structure representing the hierarchy
- Fig. 10(b) shows a table data structure representing the corresponding horizontally flattened representation of the hierarchy.
- a flattened hierarchy data structure is created for a hierarchical data structure so that the computer system can respond to queries more quickly.
- a single query is all that is needed to obtain all the information about a hierarchy that is needed for responding to a query on the hierarchical data.
- Fig. 1 illustrates such a computer system 10 according to one embodiment of the present invention.
- Computer system 10 is shown comprising a transactional data system 20 that maintains transactional data 22, such as OLTP (online transaction processing) data.
- Transactional data system 20 is coupled to a data warehouser 30 that extracts, transforms and loads the transactional data into an aggregated data set 32.
- Aggregated data set 32 and associated metadata, including a hierarchical data structure 34, are provided to a flattener 40 and an analysis system 50, such as an OLAP (online analysis system) such as an SAP R/3 database system.
- OLAP online analysis system
- SAP R/3 database system SAP R/3 database system
- Aggregated data set 32 is stored in an MD (multidimensional) database 52 in the format expected by analysis system 50.
- Hierarchical data structure 34 might be provided to analysis system 50 for storage as a parent-child table 56, but where the parent-child data is not separately needed, hierarchical data structure 34 can be provided only to flattener 40, which generates a flattened hierarchy from the hierarchical data structure 34 and provides that flattened hierarchy to analysis system 50 for storage as a flattened hierarchy table in table structure 54.
- analysis system could store hierarchical data structure 34 as parent-child table 56 and provide that table to flattener 40 for generation of the flattened hierarchy.
- Table structure 54 comprises vertically flattened hierarchies or horizontally flattened hierarchies, or both.
- An example of a simple hierarchy 80 is shown in Fig. 2.
- Fig. 2(a) shows the hierarchy is graph form. As shown there, hierarchy 80 is made up of four nodes and three edges connecting those nodes. In a typical computer system, the hierarchy is likely to have many more nodes and possibly more levels.
- the nodes are labeled "A", "B", "C” and "D”. Node A is the root node, node B is an intermediate node and nodes C and D are leaf nodes.
- Hierarchy 80 is represented in a database system as a table, such as table 82 shown in Fig. 2(b). Table 82 can be a relational database table.
- FIG. 3 illustrates a flat structure 84V that might result as a vertical flattening of table 82
- Fig. 4 illustrates a corresponding horizontally flattened hierarchy 84H.
- each possible ancestor-descendant pairing is represented by a record in the structure, including the degenerate cases where a node is paired with itself.
- structure 84V is generated by flattener 40
- flattener 40 adds the fields for the columns shown in Fig. 3 (DEPTH, ANCESTOR_ROOT and DESCEND ANT_LEAF).
- the DEPTH field indicates the number of nodes between the ancestor and the descendant, including the descendant.
- the ANCESTOR_ROOT field indicates whether or not the ancestor referenced by the row is a root node, i.e., has no parent, while the DESCEND ANT_LEAF field indicates whether or not the descendant referenced by the row is a leaf node, i.e., has no children.
- These fields are useful for operations that are specific to root or leaf nodes. It should be understood that the structures and operations described herein apply to tree structures (single root, each child being a child of exactly one parent) as well as forest structures (possibly more than one tree, possibly more than one root per connected graph and multiple parent nodes for some child nodes).
- Fig. 5 illustrates a more real-world example.
- an organization stores its sales data with a geographical dimension.
- a part of the corresponding hierarchy is shown as parent-child table 100 in Figure 5(a).
- CA California
- NY New York
- Earth worldwide
- the process of determining worldwide sales requires querying table 100 to determine the children of the root node record (Earth), then querying table 100 to determine the children of those children, and so on.
- the multidimensional database can be queried for all data matching those root, intermediate and leaf nodes.
- a flattened hierarchy such as vertically flattened hierarchy table 105 V shown in Fig. 5 (b)
- querying to determine the relevant nodes can be done in one query.
- a horizontally flattened hierarchy table is used, such as table 105H shown in Fig. 5 (c)
- the query of table 105H requires knowledge of the "level" at which the data is being sliced.
- table 105H also typically requires advanced knowledge of the maximum level for any leaf node.
- the maximum level is two, so the columns labeled "" and "LEVEL_n” could be replaced with one column labeled "LEVEL_3" containing all nulls, or the LEVEL_3 column could be omitted and the leaf node status of the last two records in table 105H could be assumed by the fact that there is no column beyond LEVEL_2.
- nodes of a hierarchy include attributes.
- the children and parents are the same objects (i.e., have the same structure for their attributes).
- a heterogeneous schema is one where different nodes have different schema.
- a homogeneous hierarchy can be represented in a relational table as a set of parent-child records, such as the relational table shown in Fig. 2(b) or Fig. 5(b).
- Information about a node could be stored in MD database 52, but where the information is specific to a node as it is positioned in its hierarchy, the preferred approach is to store the information as an attribute in the hierarchical data.
- Fig. 6 shows a table 110 that is a relational table representing an extended hierarchy (an extended parent- child table) with attributes of the node stored in the table.
- Each record represents a parent- child pairing and the attributes for the child node of the pair are stored with the record.
- the attributes of the child node depend on the pairing.
- a field labeled "Rank" is included in the table.
- the rank values found in the Rank column represent the ranking, in size, of the child among other siblings (children of a common parent).
- each child node is connected by an edge to only one parent node, then the ranking of the child is fixed and only one record per child is needed in table 110', but where a child node connects to more than one parent node (e.g., child node "Russia” connects to parent node “Earth” and parent node "Asia”), multiple records might be needed in table 110' to handle differing values for attributes that depend on which pairing is being examined.
- transitive closure is a complete list of ancestors-descendant pairings.
- a complete list would include all possible pairings of a node with itself, to accommodate the degenerate case of a single node hierarchy.
- transitive closure table 116 results from the flattening process.
- FIG. 9 A more complex, and more typical, flattening is shown in Fig. 9.
- a two-level hierarchy with attributes is represented as table 120 shown in Fig. 9(a).
- the corresponding vertically flattened hierarchy is shown as table 125V in Fig. 9(b).
- the horizontally flattened hierarchy is represented by a table having a set of rows with one row for each edge and one column for each level or attribute. This type of representation is typically useful for normalized hierarchies - hierarchies where the depth (number of levels) is the same for most branches.
- the vertically flattened hierarchy is represented by a table having a separate row for each ancestor-descendent pair (including pairings of a node with itself). This type of representation is typically used for non-normalized hierarchies - hierarchies where the depth is very different for different branches.
- each row would include the parent atributes for the ancestor and the child attributes for the descendent.
- Fig. 10(a) shows a parent-child table 130 that includes the parent attribute fields "PS" (PoliticalSystem) and the child attribute fields "PE" (PoliticalEntity) and "CR” (Child Rank, in population).
- PS Parent-System
- PE PolyEntity
- CR Child Rank, in population
- the root node attribute is stored in the "PtypeO" field
- level 1 child attributes are stored in the "Ctypel” and “Crankl” fields
- level 1 parent attributes are stored in the "Ptypel” field
- level 2 child attributes are stored in the "Ctype2" and “Crank2” fields.
- level 2 parent attributes because none of the nodes at level 2 are parent nodes; they are all leaf nodes.
- level 0 child attributes because level 0 is the root level and therefore there are no children nodes at that level.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Dans un système de base de données informatisées, les demandes sont appliquées à des données stockées qui comprennent au moins une hiérarchie représentant des divisions de ces données stockées et les réponses à ces demandes dépendent de cette hiérarchie. On entre une structure hiérarchique de données représentant des appariements parent-enfant et on génère une structure hiérarchique de données aplanie représentant une fermeture transitive de cette hiérarchie représentée par cette structure hiérarchique de données. On utilise cette structure hiérarchique de données aplanie pour générer la réponse à une demande.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US46826399A | 1999-12-20 | 1999-12-20 | |
US468263 | 1999-12-20 | ||
PCT/US2000/035183 WO2001046866A2 (fr) | 1999-12-20 | 2000-12-20 | Dispositif de restructuration de donnees pour hierarchies aplanissantes |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1397757A2 true EP1397757A2 (fr) | 2004-03-17 |
Family
ID=23859107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00993551A Withdrawn EP1397757A2 (fr) | 1999-12-20 | 2000-12-20 | Dispositif de restructuration de donnees pour hierarchies aplanissantes |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1397757A2 (fr) |
AU (1) | AU2912701A (fr) |
CA (1) | CA2395244A1 (fr) |
WO (1) | WO2001046866A2 (fr) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4878178B2 (ja) * | 2006-02-28 | 2012-02-15 | 株式会社日立製作所 | データ処理方法および装置並びにその処理プログラム |
FR2920898B1 (fr) | 2007-09-11 | 2010-07-30 | Marc Vogel | Installation de gestion d'une base de donnees |
US8825700B2 (en) * | 2008-05-26 | 2014-09-02 | Microsoft Corporation | Paging hierarchical data |
US9824121B2 (en) * | 2012-08-10 | 2017-11-21 | Arris Enterprises Llc | Aggregate data streams in relational database systems |
US10895973B2 (en) * | 2019-04-23 | 2021-01-19 | Google Llc | Browsing hierarchical datasets |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5915257A (en) * | 1994-10-11 | 1999-06-22 | Brio Technology, Inc. | Cross tab analysis and reporting method |
-
2000
- 2000-12-20 AU AU29127/01A patent/AU2912701A/en not_active Abandoned
- 2000-12-20 CA CA002395244A patent/CA2395244A1/fr not_active Abandoned
- 2000-12-20 WO PCT/US2000/035183 patent/WO2001046866A2/fr not_active Application Discontinuation
- 2000-12-20 EP EP00993551A patent/EP1397757A2/fr not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
See references of WO0146866A2 * |
Also Published As
Publication number | Publication date |
---|---|
WO2001046866A2 (fr) | 2001-06-28 |
CA2395244A1 (fr) | 2001-06-28 |
WO2001046866A3 (fr) | 2003-12-24 |
AU2912701A (en) | 2001-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9535970B2 (en) | Metric catalog system | |
US9092467B2 (en) | Systems and methods for displaying data in split dimension levels | |
US20210073188A1 (en) | System and method for automatic inference of a cube schema from a tabular data for use in a multidimensional database environment | |
US8200618B2 (en) | System and method for analyzing data in a report | |
US7480663B2 (en) | Model based optimization with focus regions | |
US9075859B2 (en) | Parameterized database drill-through | |
US20070250524A1 (en) | Method and apparatus for workload and model based materialized query table or view recommendation technique | |
US20070061287A1 (en) | Method, apparatus and program storage device for optimizing a data warehouse model and operation | |
US7698314B2 (en) | Computer-implemented systems and methods for providing a counting measure | |
US20050010550A1 (en) | System and method of modelling of a multi-dimensional data source in an entity-relationship model | |
WO2001042987A1 (fr) | Construction recursive dynamique pour bases de donnees multidimensionnelles et procedes et appareil associes | |
KR101829198B1 (ko) | 보고서의 중요도를 분석하는 메타 데이터 기반 온라인 분석 프로세싱 시스템 | |
US9002845B1 (en) | Systems and methods of analyzing changes and data between hierarchies | |
US7636709B1 (en) | Methods and systems for locating related reports | |
WO2001046866A2 (fr) | Dispositif de restructuration de donnees pour hierarchies aplanissantes | |
Jarke et al. | Multidimensional data models and aggregation | |
US20220012242A1 (en) | Hierarchical datacube query plan generation | |
Breitner | Data Warehousing and OLAP: Delivering Just-In-Time Information for Decision Support | |
Gallo et al. | Data warehouse design and management: theory and practice | |
Chaturvedi et al. | On-line analytical processing in agriculture using multidimensional cubes | |
Agrawal et al. | Analysis of multidimensional modeling related to conceptual level | |
Francett | Decisions, decisions: users take stock of data warehouse shelves. | |
Bog et al. | Enterprise Data Management for Transaction and Analytical Processing | |
Srinivasa | Query processing issues in data warehouses | |
Khan et al. | Development and Evaluation of Measuring Parameters for Data Warehouse |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20020722 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK RO SI |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20050701 |