EP1397757A2 - Dispositif de restructuration de donnees pour hierarchies aplanissantes - Google Patents

Dispositif de restructuration de donnees pour hierarchies aplanissantes

Info

Publication number
EP1397757A2
EP1397757A2 EP00993551A EP00993551A EP1397757A2 EP 1397757 A2 EP1397757 A2 EP 1397757A2 EP 00993551 A EP00993551 A EP 00993551A EP 00993551 A EP00993551 A EP 00993551A EP 1397757 A2 EP1397757 A2 EP 1397757A2
Authority
EP
European Patent Office
Prior art keywords
hierarchy
data structure
hierarchical data
flattened
child
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00993551A
Other languages
German (de)
English (en)
Inventor
Sachinder S. Chawla
Alexander Gorelik
Sridhar Gantimahapatruni
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Acta Technology Inc
Original Assignee
Acta Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Acta Technology Inc filed Critical Acta Technology Inc
Publication of EP1397757A2 publication Critical patent/EP1397757A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Definitions

  • the present invention relates to database processing in general and hierarchical database processing in particular.
  • trans actional data is collected as transactions occur and is then periodically aggregated into an analysis form to allow decision makers to glean information from the collected data.
  • a manufacturer might collect transactions relating to sales, ordering and production.
  • An example of such a transaction is a sales record, indicating an amount of the sale, the salesperson making the sale, the items sold, the office associated with that salesperson, and other related data.
  • these transaction records are often aggregated into multi-dimensional tables for analysis, to provide informative views of the organization's data.
  • a long list of thousands of sales records is not likely to be as useful to a decision maker as a report aggregating those sales records by geographical region, dates, types of items, salesperson, or other "dimension".
  • One approach to dealing with the volume of data created by an organization is to periodically populate an enterprise resource planning (ERP) database with transaction data stored in a multidimensional (MD) database.
  • ERP enterprise resource planning
  • MD multidimensional
  • An analyst or other decision maker could then query the MD database to obtain a "slice" of the data along a particular dimension to glean information about the data.
  • an analyst might query the MD database to obtain a report showing the total sales for each state for each month to get a breakdown of the annual U.S. sales of the organization.
  • the analyst is requesting a slice of the aggregated data in the MD database along the geographic and time dimensions.
  • the dimensions are hierarchical.
  • the ERP system needs to store business hierarchies, such as a cost center hierarchy, a cost element hierarchy or an organization reporting hierarchy.
  • the ERP system would include a hierarchy to indicate which cost centers belong in which sales region, which field offices are allocated to which cities, along with a hierarchy of cities, states and countries.
  • Such hierarchies can be stored in tables within the MD database system as a collection of child-parent nodes.
  • the MD data is organized by geography and sales data is aggregated by salesperson.
  • the hierarchy indicates, for each salesperson, an associated field office and each field office is associated with a sales region and each sales region is associated with a country. If an analyst queries a transactional database system to provide a report of sales by country, the transactional database system would have to examine each record in the transactional database system's table of sales records, and accumulate the results. In a real-time environment, where the analyst expects a real-time response to the query, such processing is difficult, and often impossible, to achieve in realtime. With an MD database, however, such processing is more easily done, because the data is aggregated.
  • the MD database system When the analyst makes the same query of an MD database system, the MD database system only needs to get the aggregated records and total them up according to the dimensions and slice indicated by the query. Thus, in response to a query for sales in a country of interest, the MD database system would query a geographical hierarchy table to identify which sales regions are in the country of interest, query the geographical hierarchy table to identify which field offices are in those sales regions, query the geographical hierarchy table to identify which salespersons are associated with those field offices, then query the MD database for sales by salesperson and total the sales returned.
  • This process of issuing multiple queries and summarizing the results takes time and will slow down the response to analyst queries, which can be a problem where the analyst is analyzing data in real-time, issuing queries, reviewing the responses and issuing other queries in response to the responses, in an effort to understand the organization's data.
  • improved performance of an MD database querying system would be a great benefit to analysts and organizations with large amounts of data to be analyzed.
  • a flattener In order to improve performance of a database querying system where a hierarchical structure is used in generating a response to a query, a flattener generates a horizontal representation or vertical representation of a transitive closure of a hierarchical data structure.
  • the flattened representation can be generated in advance of any queries, so that the database querying system can more quickly respond to a query.
  • the flattened representations are only generated as often as the hierarchical structure changes and are stored, so that a flattened representation does not need to be generated for each query.
  • a flattened representation represents the transitive closure of the hierarchical structure, in that all ancestor-descendant ' pairings are represented in the flattened structure. While the flattened structure typically stores the hierarchy much less efficiently than a standard parent-child hierarchy table, a precalculated flattened structure can improve the response times for responses to queries that require the hierarchical information.
  • FIG. 1 is a block diagram of a computer system according to one embodiment of the present invention.
  • Fig. 2 illustrates a hierarchy
  • Fig. 2(a) is a graph of the hierarchy
  • Fig. 2(b) shows a table data structure representing the hierarchy.
  • Fig. 3 is an illustration of a vertically flattened representation of the hierarchy shown in Fig. 2.
  • Fig. 4 is an illustration of a horizontally flattened representation of the hierarchy shown in Fig. 2.
  • Fig. 5 shows a table data structure representing an extended hierarchy, with child attributes stored in parent-child pairing records.
  • Fig. 6 shows a table data structure representing an extended hierarchy, with parent attributes and child attributes stored in parent-child pairing records; the table of Fig. 6 also illustrates how parent-child pairings are stored when a child node is a child of more than one parent node.
  • Fig. 7 illustrates the conversion of a single node hierarchy into a transitively closed hierarchy table.
  • Fig. 8 illustrates a hierarchy.
  • Fig. 9 illustrates _ n extended hierarchy;
  • Fig. 9(a) shows a table data structure representing the hierarchy;
  • Fig. 9(b) shows a table data structure representing the corresponding vertically flattened representation of the hierarchy.
  • Fig. 10 illustrates another extended hierarchy
  • Fig. 10(a) shows a table data structure representing the hierarchy
  • Fig. 10(b) shows a table data structure representing the corresponding horizontally flattened representation of the hierarchy.
  • a flattened hierarchy data structure is created for a hierarchical data structure so that the computer system can respond to queries more quickly.
  • a single query is all that is needed to obtain all the information about a hierarchy that is needed for responding to a query on the hierarchical data.
  • Fig. 1 illustrates such a computer system 10 according to one embodiment of the present invention.
  • Computer system 10 is shown comprising a transactional data system 20 that maintains transactional data 22, such as OLTP (online transaction processing) data.
  • Transactional data system 20 is coupled to a data warehouser 30 that extracts, transforms and loads the transactional data into an aggregated data set 32.
  • Aggregated data set 32 and associated metadata, including a hierarchical data structure 34, are provided to a flattener 40 and an analysis system 50, such as an OLAP (online analysis system) such as an SAP R/3 database system.
  • OLAP online analysis system
  • SAP R/3 database system SAP R/3 database system
  • Aggregated data set 32 is stored in an MD (multidimensional) database 52 in the format expected by analysis system 50.
  • Hierarchical data structure 34 might be provided to analysis system 50 for storage as a parent-child table 56, but where the parent-child data is not separately needed, hierarchical data structure 34 can be provided only to flattener 40, which generates a flattened hierarchy from the hierarchical data structure 34 and provides that flattened hierarchy to analysis system 50 for storage as a flattened hierarchy table in table structure 54.
  • analysis system could store hierarchical data structure 34 as parent-child table 56 and provide that table to flattener 40 for generation of the flattened hierarchy.
  • Table structure 54 comprises vertically flattened hierarchies or horizontally flattened hierarchies, or both.
  • An example of a simple hierarchy 80 is shown in Fig. 2.
  • Fig. 2(a) shows the hierarchy is graph form. As shown there, hierarchy 80 is made up of four nodes and three edges connecting those nodes. In a typical computer system, the hierarchy is likely to have many more nodes and possibly more levels.
  • the nodes are labeled "A", "B", "C” and "D”. Node A is the root node, node B is an intermediate node and nodes C and D are leaf nodes.
  • Hierarchy 80 is represented in a database system as a table, such as table 82 shown in Fig. 2(b). Table 82 can be a relational database table.
  • FIG. 3 illustrates a flat structure 84V that might result as a vertical flattening of table 82
  • Fig. 4 illustrates a corresponding horizontally flattened hierarchy 84H.
  • each possible ancestor-descendant pairing is represented by a record in the structure, including the degenerate cases where a node is paired with itself.
  • structure 84V is generated by flattener 40
  • flattener 40 adds the fields for the columns shown in Fig. 3 (DEPTH, ANCESTOR_ROOT and DESCEND ANT_LEAF).
  • the DEPTH field indicates the number of nodes between the ancestor and the descendant, including the descendant.
  • the ANCESTOR_ROOT field indicates whether or not the ancestor referenced by the row is a root node, i.e., has no parent, while the DESCEND ANT_LEAF field indicates whether or not the descendant referenced by the row is a leaf node, i.e., has no children.
  • These fields are useful for operations that are specific to root or leaf nodes. It should be understood that the structures and operations described herein apply to tree structures (single root, each child being a child of exactly one parent) as well as forest structures (possibly more than one tree, possibly more than one root per connected graph and multiple parent nodes for some child nodes).
  • Fig. 5 illustrates a more real-world example.
  • an organization stores its sales data with a geographical dimension.
  • a part of the corresponding hierarchy is shown as parent-child table 100 in Figure 5(a).
  • CA California
  • NY New York
  • Earth worldwide
  • the process of determining worldwide sales requires querying table 100 to determine the children of the root node record (Earth), then querying table 100 to determine the children of those children, and so on.
  • the multidimensional database can be queried for all data matching those root, intermediate and leaf nodes.
  • a flattened hierarchy such as vertically flattened hierarchy table 105 V shown in Fig. 5 (b)
  • querying to determine the relevant nodes can be done in one query.
  • a horizontally flattened hierarchy table is used, such as table 105H shown in Fig. 5 (c)
  • the query of table 105H requires knowledge of the "level" at which the data is being sliced.
  • table 105H also typically requires advanced knowledge of the maximum level for any leaf node.
  • the maximum level is two, so the columns labeled "" and "LEVEL_n” could be replaced with one column labeled "LEVEL_3" containing all nulls, or the LEVEL_3 column could be omitted and the leaf node status of the last two records in table 105H could be assumed by the fact that there is no column beyond LEVEL_2.
  • nodes of a hierarchy include attributes.
  • the children and parents are the same objects (i.e., have the same structure for their attributes).
  • a heterogeneous schema is one where different nodes have different schema.
  • a homogeneous hierarchy can be represented in a relational table as a set of parent-child records, such as the relational table shown in Fig. 2(b) or Fig. 5(b).
  • Information about a node could be stored in MD database 52, but where the information is specific to a node as it is positioned in its hierarchy, the preferred approach is to store the information as an attribute in the hierarchical data.
  • Fig. 6 shows a table 110 that is a relational table representing an extended hierarchy (an extended parent- child table) with attributes of the node stored in the table.
  • Each record represents a parent- child pairing and the attributes for the child node of the pair are stored with the record.
  • the attributes of the child node depend on the pairing.
  • a field labeled "Rank" is included in the table.
  • the rank values found in the Rank column represent the ranking, in size, of the child among other siblings (children of a common parent).
  • each child node is connected by an edge to only one parent node, then the ranking of the child is fixed and only one record per child is needed in table 110', but where a child node connects to more than one parent node (e.g., child node "Russia” connects to parent node “Earth” and parent node "Asia”), multiple records might be needed in table 110' to handle differing values for attributes that depend on which pairing is being examined.
  • transitive closure is a complete list of ancestors-descendant pairings.
  • a complete list would include all possible pairings of a node with itself, to accommodate the degenerate case of a single node hierarchy.
  • transitive closure table 116 results from the flattening process.
  • FIG. 9 A more complex, and more typical, flattening is shown in Fig. 9.
  • a two-level hierarchy with attributes is represented as table 120 shown in Fig. 9(a).
  • the corresponding vertically flattened hierarchy is shown as table 125V in Fig. 9(b).
  • the horizontally flattened hierarchy is represented by a table having a set of rows with one row for each edge and one column for each level or attribute. This type of representation is typically useful for normalized hierarchies - hierarchies where the depth (number of levels) is the same for most branches.
  • the vertically flattened hierarchy is represented by a table having a separate row for each ancestor-descendent pair (including pairings of a node with itself). This type of representation is typically used for non-normalized hierarchies - hierarchies where the depth is very different for different branches.
  • each row would include the parent atributes for the ancestor and the child attributes for the descendent.
  • Fig. 10(a) shows a parent-child table 130 that includes the parent attribute fields "PS" (PoliticalSystem) and the child attribute fields "PE" (PoliticalEntity) and "CR” (Child Rank, in population).
  • PS Parent-System
  • PE PolyEntity
  • CR Child Rank, in population
  • the root node attribute is stored in the "PtypeO" field
  • level 1 child attributes are stored in the "Ctypel” and “Crankl” fields
  • level 1 parent attributes are stored in the "Ptypel” field
  • level 2 child attributes are stored in the "Ctype2" and “Crank2” fields.
  • level 2 parent attributes because none of the nodes at level 2 are parent nodes; they are all leaf nodes.
  • level 0 child attributes because level 0 is the root level and therefore there are no children nodes at that level.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Dans un système de base de données informatisées, les demandes sont appliquées à des données stockées qui comprennent au moins une hiérarchie représentant des divisions de ces données stockées et les réponses à ces demandes dépendent de cette hiérarchie. On entre une structure hiérarchique de données représentant des appariements parent-enfant et on génère une structure hiérarchique de données aplanie représentant une fermeture transitive de cette hiérarchie représentée par cette structure hiérarchique de données. On utilise cette structure hiérarchique de données aplanie pour générer la réponse à une demande.
EP00993551A 1999-12-20 2000-12-20 Dispositif de restructuration de donnees pour hierarchies aplanissantes Withdrawn EP1397757A2 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US46826399A 1999-12-20 1999-12-20
US468263 1999-12-20
PCT/US2000/035183 WO2001046866A2 (fr) 1999-12-20 2000-12-20 Dispositif de restructuration de donnees pour hierarchies aplanissantes

Publications (1)

Publication Number Publication Date
EP1397757A2 true EP1397757A2 (fr) 2004-03-17

Family

ID=23859107

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00993551A Withdrawn EP1397757A2 (fr) 1999-12-20 2000-12-20 Dispositif de restructuration de donnees pour hierarchies aplanissantes

Country Status (4)

Country Link
EP (1) EP1397757A2 (fr)
AU (1) AU2912701A (fr)
CA (1) CA2395244A1 (fr)
WO (1) WO2001046866A2 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4878178B2 (ja) * 2006-02-28 2012-02-15 株式会社日立製作所 データ処理方法および装置並びにその処理プログラム
FR2920898B1 (fr) 2007-09-11 2010-07-30 Marc Vogel Installation de gestion d'une base de donnees
US8825700B2 (en) * 2008-05-26 2014-09-02 Microsoft Corporation Paging hierarchical data
US9824121B2 (en) * 2012-08-10 2017-11-21 Arris Enterprises Llc Aggregate data streams in relational database systems
US10895973B2 (en) * 2019-04-23 2021-01-19 Google Llc Browsing hierarchical datasets

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5915257A (en) * 1994-10-11 1999-06-22 Brio Technology, Inc. Cross tab analysis and reporting method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0146866A2 *

Also Published As

Publication number Publication date
WO2001046866A2 (fr) 2001-06-28
CA2395244A1 (fr) 2001-06-28
WO2001046866A3 (fr) 2003-12-24
AU2912701A (en) 2001-07-03

Similar Documents

Publication Publication Date Title
US9535970B2 (en) Metric catalog system
US9092467B2 (en) Systems and methods for displaying data in split dimension levels
US20210073188A1 (en) System and method for automatic inference of a cube schema from a tabular data for use in a multidimensional database environment
US8200618B2 (en) System and method for analyzing data in a report
US7480663B2 (en) Model based optimization with focus regions
US9075859B2 (en) Parameterized database drill-through
US20070250524A1 (en) Method and apparatus for workload and model based materialized query table or view recommendation technique
US20070061287A1 (en) Method, apparatus and program storage device for optimizing a data warehouse model and operation
US7698314B2 (en) Computer-implemented systems and methods for providing a counting measure
US20050010550A1 (en) System and method of modelling of a multi-dimensional data source in an entity-relationship model
WO2001042987A1 (fr) Construction recursive dynamique pour bases de donnees multidimensionnelles et procedes et appareil associes
KR101829198B1 (ko) 보고서의 중요도를 분석하는 메타 데이터 기반 온라인 분석 프로세싱 시스템
US9002845B1 (en) Systems and methods of analyzing changes and data between hierarchies
US7636709B1 (en) Methods and systems for locating related reports
WO2001046866A2 (fr) Dispositif de restructuration de donnees pour hierarchies aplanissantes
Jarke et al. Multidimensional data models and aggregation
US20220012242A1 (en) Hierarchical datacube query plan generation
Breitner Data Warehousing and OLAP: Delivering Just-In-Time Information for Decision Support
Gallo et al. Data warehouse design and management: theory and practice
Chaturvedi et al. On-line analytical processing in agriculture using multidimensional cubes
Agrawal et al. Analysis of multidimensional modeling related to conceptual level
Francett Decisions, decisions: users take stock of data warehouse shelves.
Bog et al. Enterprise Data Management for Transaction and Analytical Processing
Srinivasa Query processing issues in data warehouses
Khan et al. Development and Evaluation of Measuring Parameters for Data Warehouse

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020722

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20050701