EP2396753A1 - Etl builder - Google Patents

Etl builder

Info

Publication number
EP2396753A1
EP2396753A1 EP10740832A EP10740832A EP2396753A1 EP 2396753 A1 EP2396753 A1 EP 2396753A1 EP 10740832 A EP10740832 A EP 10740832A EP 10740832 A EP10740832 A EP 10740832A EP 2396753 A1 EP2396753 A1 EP 2396753A1
Authority
EP
European Patent Office
Prior art keywords
data
cube
business
cash
account
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP10740832A
Other languages
German (de)
French (fr)
Other versions
EP2396753A4 (en
Inventor
Mark Joseph Ledwich
James Henry Wilson
Frank Chi-Jao Lin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zap Holdings Ltd
Original Assignee
Zap Holdings Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2009900510A external-priority patent/AU2009900510A0/en
Application filed by Zap Holdings Ltd filed Critical Zap Holdings Ltd
Publication of EP2396753A1 publication Critical patent/EP2396753A1/en
Publication of EP2396753A4 publication Critical patent/EP2396753A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Definitions

  • This invention relates to the creation of a data warehouse for use in B I (Business Intelligence) systems.
  • B I Business Intelligence
  • Relational databases for CRM and ERP are usually customised to suit the business needs of particular industries. Although some computer companies provide cubes that can be used with these databases they do not take account of the customisations that have taken place.
  • BI systems To enable BI systems to carry out their analysis a cumbersome and expert driven process of synchronizing the databases to the analysis cube is needed. The cost of this process is a deterrent to purchasing and implementing BI systems and only large enterprises can justify the costs involved.
  • the first step of this process elicits the business requirements for the system from the users in the organization. This would typically involve a consultant interviewing users around business processes and jointly determining the information those users require to do their job on a day to day basis as well as provide them with information to improve their decision making capabilities. Once business requirements have been gathered, the consultant will identify what data is required and in which system that this data currently resides.
  • the data is extracted from each source system into a staging database.
  • This database is transformed into a star schema structure.
  • Each ETL task must be designed to do this task efficiently. Transformation of the data also need to be designed at this point such as converting methods of converting complex ERP structures into simple reporting structures.
  • the warehouse must be designed in such a way as to allow large volumes of data to be accessed rapidly. It must also have a structure that will allow reports to be easily constructed against them.
  • Cubes are constructed of measures and dimensions. Measures represent how an item is measured. For example, a sales representative is measured against revenue and margin. Dimensions break the measures down into business categories. For example, a sales representative is a dimension, the customer is a dimension and the date is a dimension.
  • Reports must be designed to meet the business requirements. Report Parameters, subtotals, headings and format need to be thought through.
  • the build phase commences and the following items must be created.
  • a business intelligence developer usually performs this task. For example, with Microsoft's SQL Server, the following tasks would need to be performed by a product specialist:
  • the implementation phase includes the steps:
  • building a business intelligence solution for an ERP system is a labour-intensive, specialist-driven process with many complexities.
  • USA patent application 2005/0149583 discloses a method of merging data in two different versions of the same database by comparing the two databases' metadata and using a difference algorithm to identify the differences and then develop a metadata exchange strategy to merge the two databases.
  • WO 2007/95959 discloses a method of generating data warehouses and OLAP cubes without requiring knowledge of database query languages.
  • the system uses a star schema. This approach still requires expertise in building the data ware house for the OLAP cube and this is often too expensive for smaller scale busineses.
  • WO 2007072501 discloses a system for a business performance platform that has a data source, an instrumentation layer for deriving measurement information from multiple formats and integrating it into a canonical format, a consolidation layer for filtering and preprocessing instrumentation layer output , a business modelling layer and a presentation layer.
  • USA application 2006/0271568 discloses a method of assembling a data ware house using data reduction, aggregate and dimension and fulfilment processes.
  • USA patent application 2005/0033726 discloses a business intelligence system that does a way with a data store and uses meta data view module to access data organised on the basis of data connection, data foundation, Business efemnt and business view and security.
  • WO 2007072501 discloses a system for a business performance platform that has a data source, an instrumentation layer for deriving measurement information from multiple formats and integrating it into a canonical format, a consolidation layer for filtering and preprocessing instrumentation layer output , a business modelling layer and a presentation layer.
  • To this end invention provides a method of building a staging database for use in creating an OLAP cube for business intelligence applications, which examines the staging metadata database and constructs a schedule of operations to extract the data, transform it and load it into the staging database by using normalisation, translation blocks and filtering to create a business view of the data that can be seen in the cube using standard query languages.
  • the present invention presents a method for automating the complex ETL processes typically required in building a BI solution based on an ERP system.
  • the invention does not require a traditional data warehouse to build a cube.
  • the final output of this invention is a staging database which is used as the source database in the process previously described in copending application 2008905207.
  • the staging metadata database used in this invention is prepared according to the method of co-pending application ... DEFINITIONS CRM
  • the schema of a database system is its structure described in a formal language supported by the database management system (DBMS).
  • DBMS database management system
  • the schema defines the tables, the fields in each table, and the relationships between fields and tables.
  • Enterprise Resource Planning is an industry term for the broad set of activities supported by multi-module application software that helps a manufacturer or other business manage the important parts of its business, including product planning, parts purchasing, maintaining inventories MDX
  • MDX The leading query language for multi-dimensional databases is MDX, which was created to query OLAP databases, and has become widely adopted with the realm of OLAP applications. Normalization
  • OnLine Analytical Processing systems enable executives to gain insight into data by providing fast, interactive access to a variety of possible views of information.
  • the following definitions introduce concepts that reflect the multidimensional view and are basic to OLAP.
  • a “dimension” is a structure that categorizes data. Commonly used dimensions include customer, product, and time. Typically, a dimension is associated with one or more hierarchies. Several distinct dimensions, combined with measures, enable end users to answer business questions. For example, a Time dimension that categorizes data by month helps to answer the question, "Did we sell more widgets in January or June?"
  • a “measure” includes data, usually numeric and on a ratio scale, that can be examined and analysed. Typically, one or more dimensions categorize a given measure, and it is described as “dimensioned by” them.
  • a "hierarchy” is a logical structure that uses ordered levels as a means of organizing dimension members in parent-child relationships. Typically, end users can expand or collapse the hierarchy by drilling down or up on its levels.
  • a “level” is a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the day, month, quarter and year levels.
  • An “attribute” is a descriptive characteristic of the elements of a dimension that an end user can specify to select data. For example, end users might choose products using a colour attribute. In this instance, the colour attribute is being used as an "axis of aggregation". Some attributes can represent keys or relationships into other tables.
  • a “query” is a specification for a particular set of data, which is referred to as the query's result set. The specification requires selecting, aggregating, calculating or otherwise manipulating data. If such manipulation is required, it is an intrinsic part of the query.
  • Metadata is a key concept involved in this invention. Metadata is essentially data about data. It is information describing the entities in a database (either relational or multidimensional). It also contains information on the relationship between these entities and the security information detailing what information users are permitted to see.
  • Data is typically stored in multiple tables in a database. Often the records in one table relate to an entity in another table. Where this is the case, the two tables are considered to be related.
  • a special value can be stored with each row that links it to the base entity. For example, imagine a database with a customer table and an address table. The address table has an additional field, CustomerlD, which links it with the corresponding customer record in the customer table.
  • Figure 1 is a schematic outline of the system which utilises the ETL builder of this invention .
  • This invention uses the staging metadata as prepared by copending application PCT/AU09/001326 the disclosures of which are herby incorporated by reference.
  • the ETL builder examines the staging database metadata and constructs a schedule of operations to extract the data, transform it and load it into the staging database.
  • This schedule can then be translated into an appropriate language for the database management system, such as SQL Server Integration Services packages, and then handed off for execution.
  • the present invention includes a de-normalization step which simplifies the resulting cube structure and improves performance. This process is outlined in copending applicationAU2009900509 Once this new table structure has been derived, the present invention automatically generates the ETL code necessary to effect this denormalization. A description of the derivation of this structural change is included here for reference. This is achieved by combining two or more tables together, such that each row contains all the information, originally shared across tables, relevant to each record.
  • a user wishes to report on sales value, cost of sale and margin, and this is normally done by summarizing the items on the Sales Line table. In this case however, the user also wants to view the same values by Sales Person.
  • Option 1 represents the status quo and leads to a complex cube with poor performance.
  • Option 3 leads to an unnecessarily complex cube with referential dimensions.
  • the present invention provides application blocks that allow you to process the source data of specific applications to facilitate comprehensive analysis and reporting. Generic operations supported are
  • Navision refers to financial statement designs as account schedules.
  • the Navision ERP system uses the Account Schedule Module to allow users to produce financial statements or key performance indicators.
  • the user has the ability to create reports using Rows and columns which either contain ranges of account numbers from the general Ledger or expressions.
  • Step 1 Provides Account Schedules
  • This example deals with two account types..
  • Accounts - these contain accounts or account ranges, the accounts are used to Aggregate General Ledger totals.
  • the '..' is used to specify a range of accounts
  • Step 2- Lookup the General Ledger Accounts
  • Step 3 Account Explosion
  • the records shown in Table 3 will now be inserted into the stage account schedule. If the exploded account number has a total account such as account 11169, it will also have to be exploded to ensure that only posting accounts are inserted.
  • Step 4 Insert Dynamics NAV Dimensions
  • Each account schedule line may also have NAV Dimensions linked to them so another level of explosion has to take place for each Dimension added:
  • Lisa is a customer with ABC Inc. She first lived in Brisbane, Queensland. So, the original entry in the customer lookup table has the following record:
  • Type 1 The new record replaces the original record. No trace of the old record exists.
  • Type 2 A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.
  • Type 3 The original record is modified to reflect the change.
  • the invention we present solves this problem automatically by applying the following policy:
  • Each record will by default employ a Type 1 approach, however if the user chooses to perform incremental updates to their cube then the Type 2 approach will be used.
  • the present invention introduces the concept of a "Business View”. This invention empowers business users to filter data and introduce calculated fields to ETL jobs without any technical knowledge. Often data will need to be excluded from a cube based on criteria defined by the user. Each filter will be applied at the table level. Users may also wish to include new columns that are calculated from others.
  • the TIVA Corporation has changed direction and has discontinued their 'Bargain Bin' product line. This data is of no use in reports or for cube analysis and needs to be excluded from the cube. To do this a filter must be applied to across the tables that reference product line.
  • the screen shot above illustrates how the invention allows the user to simply specify a filter condition. Behind the scenes, the invention translates this expression into an SQL statement in the SSIS package created by the tool.
  • the filter supports a full range of Boolean logical operations on any number of columns.
  • the calculated fields can be derived using any mathematical expression or set operation on the columns of the table.
  • Example 1 The 'Bargain Bin' product line may exist on various tables such as Sales Quotes or
  • the functionality will allow the user to automatically copy all of the filters to the other tables. If the field name is different data samples will be used to determine the correct column to apply the filter on.
  • Example 2 Consider the case where the first 2 characters of the department code identify the cost centre for a department. A derived column can be created that identifies the cost centre and this would automatically be shared amongst all tables where the department code exists.
  • the Wizard When large volumes of data of a transactional nature are included in the cube, past a particular threshold, the Wizard creates what is known as a relational dimension or ROLAP dimension, rather than a standard OLAP dimension. This results in smaller cubes, less processing time and greater query performance. From the above it can be seen that the present invention provides a time and cost saving solution for automatically implementing the generated design of an OLAP cube by generating the necessary ETL code, Those skilled in the art will realise that this invention may be implemented in embodiments other than those described without departing from the core teachings of this invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method of building a staging database which examines the staging database metadata and constructs a schedule of operations to extract the data, transform it and load it into the staging database by using normalisation, translation blocks and filtering to create a business view of the data that can be seen using standard query languages. This is applied to products such as Navision in building an OLAP cube for use in business intelligence applications.

Description

ETL Builder
This invention relates to the creation of a data warehouse for use in B I (Business Intelligence) systems. Background to the Invention Relational databases for CRM and ERP are usually customised to suit the business needs of particular industries. Although some computer companies provide cubes that can be used with these databases they do not take account of the customisations that have taken place. To enable BI systems to carry out their analysis a cumbersome and expert driven process of synchronizing the databases to the analysis cube is needed. The cost of this process is a deterrent to purchasing and implementing BI systems and only large enterprises can justify the costs involved.
In preparing an ERP system for BI the usual steps are to establish the business requirements, source the data requirements, design, build, implement and also manage security.
The first step of this process elicits the business requirements for the system from the users in the organization. This would typically involve a consultant interviewing users around business processes and jointly determining the information those users require to do their job on a day to day basis as well as provide them with information to improve their decision making capabilities. Once business requirements have been gathered, the consultant will identify what data is required and in which system that this data currently resides.
The design phase will be undertaken by the technical consultant and will consist of the following items.
• Extraction Transformation and Load (ETL)
The data is extracted from each source system into a staging database. This database is transformed into a star schema structure. Each ETL task must be designed to do this task efficiently. Transformation of the data also need to be designed at this point such as converting methods of converting complex ERP structures into simple reporting structures.
• Data Warehouse Design
The warehouse must be designed in such a way as to allow large volumes of data to be accessed rapidly. It must also have a structure that will allow reports to be easily constructed against them.
• Cube Design
The cube must be designed so that it can support all of the business requirements. This is typically a complex iterative process involving business analysts and business intelligence specialists. Cubes are constructed of measures and dimensions. Measures represent how an item is measured. For example, a sales representative is measured against revenue and margin. Dimensions break the measures down into business categories. For example, a sales representative is a dimension, the customer is a dimension and the date is a dimension.
• Report Design
Reports must be designed to meet the business requirements. Report Parameters, subtotals, headings and format need to be thought through.
Once the design phase is completed, the build phase commences and the following items must be created. As this is technical in nature, a business intelligence developer usually performs this task. For example, with Microsoft's SQL Server, the following tasks would need to be performed by a product specialist:
Task Expertise Required
Build data warehouse SQL Server Relational Databases
SQL Server Management Studio
Report Design SQL Server Reporting Services
The implementation phase includes the steps:
• Installation
Install the ETL jobs, cubes, data warehouse and reports.
• Testing The whole process must be tested to ensure that the cube and the reports are delivering the correct results to the user. This is normally done by reviewing reports from source systems and then validating them against the BI system.
• Training
Ensure that the users can use the cubes and the reports efficiently and that technical support staff have the ability to maintain and customize the system over time.
Traditionally, security requirements for the BI system were gathered as part of the requirements stage. This was built into the cube manually, but a major source of labour was ongoing maintenance and manual synchronization efforts to ensure only the right people saw privileged information.
As illustrated above, building a business intelligence solution for an ERP system is a labour-intensive, specialist-driven process with many complexities.
USA patent application 2005/0149583 discloses a method of merging data in two different versions of the same database by comparing the two databases' metadata and using a difference algorithm to identify the differences and then develop a metadata exchange strategy to merge the two databases.
WO 2007/95959 discloses a method of generating data warehouses and OLAP cubes without requiring knowledge of database query languages. The system uses a star schema. This approach still requires expertise in building the data ware house for the OLAP cube and this is often too expensive for smaller scale busineses.
WO 2007072501 discloses a system for a business performance platform that has a data source, an instrumentation layer for deriving measurement information from multiple formats and integrating it into a canonical format, a consolidation layer for filtering and preprocessing instrumentation layer output , a business modelling layer and a presentation layer. USA application 2006/0271568 discloses a method of assembling a data ware house using data reduction, aggregate and dimension and fulfilment processes. USA patent application 2005/0033726 discloses a business intelligence system that does a way with a data store and uses meta data view module to access data organised on the basis of data connection, data foundation, Business efemnt and business view and security.
WO 2007072501 discloses a system for a business performance platform that has a data source, an instrumentation layer for deriving measurement information from multiple formats and integrating it into a canonical format, a consolidation layer for filtering and preprocessing instrumentation layer output , a business modelling layer and a presentation layer.
It is an object of this invention to provide an automatic method of preparing a data store and to then construct a schedule of operations to extract the data , transform it and load it into a staging database from which an OLAP Cube can be created.
Brief Description of the Invention
To this end invention provides a method of building a staging database for use in creating an OLAP cube for business intelligence applications, which examines the staging metadata database and constructs a schedule of operations to extract the data, transform it and load it into the staging database by using normalisation, translation blocks and filtering to create a business view of the data that can be seen in the cube using standard query languages.
As illustrated above, building a business intelligence solution for an ERP system is a labour-intensive, specialist-driven process with many complexities. The present invention presents a method for automating the complex ETL processes typically required in building a BI solution based on an ERP system. Of particular note, the invention does not require a traditional data warehouse to build a cube.
The final output of this invention is a staging database which is used as the source database in the process previously described in copending application 2008905207. The staging metadata database used in this invention is prepared according to the method of co-pending application ... DEFINITIONS CRM
Customer Relationship Management Cube A multi-dimensional database optimized for fast retrieval and aggregation of data ETL - Extract Transformrand load in data ware housing involves extracting data from outside sources, transforming the^data to fit needs in the OLAP cube and loading it into a staging data base DSV Data Source View - a view of the base system data which maps more naturally to its definition in the cube than the raw data Database Schema
The schema of a database system is its structure described in a formal language supported by the database management system (DBMS). In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables. ERP
Enterprise Resource Planning is an industry term for the broad set of activities supported by multi-module application software that helps a manufacturer or other business manage the important parts of its business, including product planning, parts purchasing, maintaining inventories MDX
The leading query language for multi-dimensional databases is MDX, which was created to query OLAP databases, and has become widely adopted with the realm of OLAP applications. Normalization
In the field of relational database design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics — insertion, update, and deletion anomalies — that could lead to a loss of data integrity. However, this form is not optimal for querying which is why OLAP cubes have a different structure. OLAP
OnLine Analytical Processing systems enable executives to gain insight into data by providing fast, interactive access to a variety of possible views of information. The following definitions introduce concepts that reflect the multidimensional view and are basic to OLAP.
A "dimension" is a structure that categorizes data. Commonly used dimensions include customer, product, and time. Typically, a dimension is associated with one or more hierarchies. Several distinct dimensions, combined with measures, enable end users to answer business questions. For example, a Time dimension that categorizes data by month helps to answer the question, "Did we sell more widgets in January or June?"
A "measure" includes data, usually numeric and on a ratio scale, that can be examined and analysed. Typically, one or more dimensions categorize a given measure, and it is described as "dimensioned by" them.
A "hierarchy" is a logical structure that uses ordered levels as a means of organizing dimension members in parent-child relationships. Typically, end users can expand or collapse the hierarchy by drilling down or up on its levels.
A "level" is a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the day, month, quarter and year levels. An "attribute" is a descriptive characteristic of the elements of a dimension that an end user can specify to select data. For example, end users might choose products using a colour attribute. In this instance, the colour attribute is being used as an "axis of aggregation". Some attributes can represent keys or relationships into other tables. A "query" is a specification for a particular set of data, which is referred to as the query's result set. The specification requires selecting, aggregating, calculating or otherwise manipulating data. If such manipulation is required, it is an intrinsic part of the query.
"Metadata" is a key concept involved in this invention. Metadata is essentially data about data. It is information describing the entities in a database (either relational or multidimensional). It also contains information on the relationship between these entities and the security information detailing what information users are permitted to see.
Relationship
Data is typically stored in multiple tables in a database. Often the records in one table relate to an entity in another table. Where this is the case, the two tables are considered to be related. In a relational database for example, a special value can be stored with each row that links it to the base entity. For example, imagine a database with a customer table and an address table. The address table has an additional field, CustomerlD, which links it with the corresponding customer record in the customer table.
Detailed Description of the Invention
A preferred embodiment of the invention will be described with reference to the drawings in which: Figure 1 is a schematic outline of the system which utilises the ETL builder of this invention .
This invention uses the staging metadata as prepared by copending application PCT/AU09/001326 the disclosures of which are herby incorporated by reference.
ETL Builder
The ETL builder examines the staging database metadata and constructs a schedule of operations to extract the data, transform it and load it into the staging database.
This schedule can then be translated into an appropriate language for the database management system, such as SQL Server Integration Services packages, and then handed off for execution.
De-normalization
Most databases are in a relatively normalized form which facilitates smaller database sizes and faster updates but this is suboptimal for querying and analysis. They also lead to relatively complex cubes.
The present invention includes a de-normalization step which simplifies the resulting cube structure and improves performance. This process is outlined in copending applicationAU2009900509 Once this new table structure has been derived, the present invention automatically generates the ETL code necessary to effect this denormalization. A description of the derivation of this structural change is included here for reference. This is achieved by combining two or more tables together, such that each row contains all the information, originally shared across tables, relevant to each record.
Example
Consider the following tables:
A user wishes to report on sales value, cost of sale and margin, and this is normally done by summarizing the items on the Sales Line table. In this case however, the user also wants to view the same values by Sales Person.
Ordinarily, to do this in the cube it would mean that we have to include the Sales Header table, which is really only needed for its Customer Number and Sales Person Number fields.
Solution
There are 3 ways of handling this
1. Modify the query in the cube to include all three tables 2. Merge the fields of the three tables into a single table.
3. Add the items as dimensions and measure groups
Option 1 represents the status quo and leads to a complex cube with poor performance. Option 3 leads to an unnecessarily complex cube with referential dimensions.
The best solution is Option 2 and results in the following table which retains all information but allows for faster, simpler queries. Order Number
Line Number
Item Number
Value
Cost
Margin
Customer Number
Sales Person Number
Translation Block
Often the source data of a specific application will have its own nuances and way of doing things that make it difficult to unwind and group data in order for analysis. The present invention provides application blocks that allow you to process the source data of specific applications to facilitate comprehensive analysis and reporting. Generic operations supported are
• Lookup tables
• Recursively expanding ranges
• Filtering
• Parsing basic mathematical expressions and translating them into MDX These basic capabilities can be configured to support specific applications, allowing the user to import these reporting designs directly into the solution, providing a seamless reporting environment that faithfully reflects their current system (for example, financial statement designs derived from account schedules in Microsft Dynamics NAV).
Financial Statement Designs for Microsoft Dynamics NAV (Navision)
Navision refers to financial statement designs as account schedules.
The Navision ERP system uses the Account Schedule Module to allow users to produce financial statements or key performance indicators. The user has the ability to create reports using Rows and columns which either contain ranges of account numbers from the general Ledger or expressions.
The example below refers to these two tables from a Navision system. Table 1 - Account Schedule
Line No Type Line Description Account
10 1 Cash and cash equivalents at beginning of period 11299
20 1 Movement in Clearing Accounts 11399
30 1 Movement in Deposits Held 11599
40 1 Movement in Earned & Unbilled Revenue 11699
50 2 Total 10+20+30+40
Table 2 - Chart of Accounts
Account Account Name Totalling
11299 Cash & Cash Equivalents Total 11100..11299
11399 Clearing Accounts Total 11300..11399
11599 Deposits Held Total 11500..11599
11699 Earned & Unbilled Rev Total 11600..11699
Table 3 - Accounts Exploded
Stage Account Schedule
Line Account Description Group
10 11100 Cash & Cash Equivalents Cash and cash equivalents at beginning of period
10 11110 Cash on Hand Cash and cash equivalents at beginning of period
10 11120 Investments on Call Cash and cash equivalents at beginning of period
10 11122 Listed Securities Cash and cash equivalents at beginning of period
10 11129 Investments on Call Total Cash and cash equivalents at beginning of period
10 11130 Operating Accounts Cash and cash equivalents at beginning of period
10 11152 DBS Bank SGD Cash and cash equivalents at beginning of period
10 11153 DBS Bank AUD Cash and cash equivalents at beginning of period
10 11169 Operating Accounts Total Cash and cash equivalents at beginning of period
10 11170 Petty cash floats Cash and cash equivalents at beginning of period 11172 Petty Cash - Airlie Beach Cash and cash equivalents at beginning of period
11174 Petty Cash - Sydney Cash and cash equivalents at beginning of period
11176 Petty Cash - Korea Cash and cash equivalents at beginning of period
11179 Petty cash Total Cash and cash equivalents at beginning of period
11190 Voucher & Tokens Cash and cash equivalents at beginning of period
11192 Corporate Gifts Cash and cash equivalents at beginning of period
11194 Gift Vouchers Cash and cash equivalents at beginning of period
11196 Parking Vouchers Cash and cash equivalents at beginning of period
11199 Voucher & Tokens Total Cash and cash equivalents at beginning of period
11200 Treasury Accounts Cash and cash equivalents at beginning of period
11299 Cash & Cash Equivalents Total Cash and cash equivalents at beginning of period
Step 1 -Process Account Schedules
Account schedules are being imported into the solution to allow users to retain all of the IP built up by creating the schedules and have the advanced reporting capabilities which Zap offers.
The following is an example of a NAV account Schedule the major components as far as the Zap solution are concerned are schedule name, Line Number, Type,
Description and Account number.
Type
This example deals with two account types..
1. Accounts - these contain accounts or account ranges, the accounts are used to Aggregate General Ledger totals. The '..' is used to specify a range of accounts
2. Expressions - These contain expressions used on the report for calculated items in our example we are using + to add the lines together. The following may also be used when creating expressions :
+ Add
Subtract / Divide
Multiply I add
Range
The application must parse these expressions to allow them to be reported on using the cube. To do this the following steps need to be followed. Step 2- Lookup the General Ledger Accounts
Using the account number in the schedule the application will look up the relevant accounts on the Chart of Accounts table within NAV. In Table 2 the accounts have been returned after looking them up using the Accounts in the account schedule shown in Table 1.
In order to aggregate values from General Ledger Transaction (GL Trans) table we need to be using posting accounts. The accounts we have returned in table 1 are Total accounts which do not have any values posted against them in the general ledger. The Totalling field contains the range of posting accounts required to link to the transaction table. The Account 11299 'Cash & Cash Equivalents Total' contains the range of 11100 to 11299, we will need to look up the accounts for this range to link to the GL Trans table.
Step 3 - Account Explosion The records shown in Table 3 will now be inserted into the stage account schedule. If the exploded account number has a total account such as account 11169, it will also have to be exploded to ensure that only posting accounts are inserted.
Step 4 - Insert Dynamics NAV Dimensions Each account schedule line may also have NAV Dimensions linked to them so another level of explosion has to take place for each Dimension added:
Line Totaling Dimension! Totaling
Revenue Area 10..30, Total 6110..6195 10..30
Revenue Area 40..85, Total 6110..6195 40..85
Account Dimensionl
Revenue Area 10..30, Total 6110 10
Revenue Area 10..30, Total 6110 20
Revenue Area 10..30, Total 6110 30
Revenue Area 40..85, Total 6110 40
Revenue Area 40..85, Total 6110 50
Revenue Area 10..30, Total 6111 10
Revenue Area 10..30, Total 6111 20 The result of the explosion of all of these records now presents an issue when linking back to the GL transaction table as the granularity of the new account schedule line is lower than the account number. To resolve this a "many to many" dimension is created in the cube.
Step 5- Inserting Expression Records
Once all the account schedule records and the accounts have been exploded the expressions can be inserted. The Nav expression must be now be parsed into MDX.
In our example we are adding the lines together '10+20+30+40'. This is achieved by looking up the relevant accounts and obtaining the maximum and minimum accounts numbers to create an MDX set. The line number 10 will be converted into the following
[Account].[11100]:[Account].[11299] + [Account].[11300]:[Account].[11399]+
[Account].[11500]:[Account].[11599]+ [Account].[11600]:[Account].[11699] This now forms a custom rollup in the Analysis Services cube.
Many To Many Dimensions
In the particular embodiment of the invention used in this example, the "Many to Many" functionality of Microsoft Analysis Services is utilised to link the new converted account schedules back to the GL Entry table.
Slowly Changing Dimensions
The "Slowly Changing Dimension" problem is a common one particular to reporting. Succinctly, this applies to cases where the attribute for a record varies over time. We give an example below:
Lisa is a customer with ABC Inc. She first lived in Brisbane, Queensland. So, the original entry in the customer lookup table has the following record:
Customer Key Name State
1000 ": ~ -tisaf. 1 Queensland ' ^J
At a later date, she moved to Sydney, New South Wales on January, 2009. How should ABC Inc. now modify its customer table to reflect this change? This is the
"Slowly Changing Dimension" problem.
There are in general three ways to solve this type of problem, and they are categorized as follows:
Type 1: The new record replaces the original record. No trace of the old record exists. Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.
Customer Key Name State Status
HOOO = X- . '' t.is-3 " "K, -"Queensland,/ V*,'' Inactive - > Λ - H
1001 Lisa New South Wales Active
Type 3: The original record is modified to reflect the change. The invention we present solves this problem automatically by applying the following policy:
Each record will by default employ a Type 1 approach, however if the user chooses to perform incremental updates to their cube then the Type 2 approach will be used.
Business Views
To allow business users to quickly and easily create a view of their data for analysis, the present invention introduces the concept of a "Business View". This invention empowers business users to filter data and introduce calculated fields to ETL jobs without any technical knowledge. Often data will need to be excluded from a cube based on criteria defined by the user. Each filter will be applied at the table level. Users may also wish to include new columns that are calculated from others.
Example
The TIVA Corporation has changed direction and has discontinued their 'Bargain Bin' product line. This data is of no use in reports or for cube analysis and needs to be excluded from the cube. To do this a filter must be applied to across the tables that reference product line.
The screen shot above illustrates how the invention allows the user to simply specify a filter condition. Behind the scenes, the invention translates this expression into an SQL statement in the SSIS package created by the tool. The filter supports a full range of Boolean logical operations on any number of columns. Similarly, the calculated fields can be derived using any mathematical expression or set operation on the columns of the table.
Global Filters and Calculated Expressions Calculations and filters created using the same method as the business view can be applied to all tables where the columns contained in that expression are used.
This eliminates the tedious and repetitive process of introducing common operations germane to many tables.
Example 1 The 'Bargain Bin' product line may exist on various tables such as Sales Quotes or
Inventory. The functionality will allow the user to automatically copy all of the filters to the other tables. If the field name is different data samples will be used to determine the correct column to apply the filter on.
Example 2 Consider the case where the first 2 characters of the department code identify the cost centre for a department. A derived column can be created that identifies the cost centre and this would automatically be shared amongst all tables where the department code exists.
When large volumes of data of a transactional nature are included in the cube, past a particular threshold, the Wizard creates what is known as a relational dimension or ROLAP dimension, rather than a standard OLAP dimension. This results in smaller cubes, less processing time and greater query performance. From the above it can be seen that the present invention provides a time and cost saving solution for automatically implementing the generated design of an OLAP cube by generating the necessary ETL code, Those skilled in the art will realise that this invention may be implemented in embodiments other than those described without departing from the core teachings of this invention.

Claims

1. A computer operable method of building a staging database for use in creating an OLAP cube for business intelligence applications, which uses a computer to examine the staging database metadata and constructs a schedule of operations to extract the data, transform it and load it into the cube by using translation blocks that facilitate lookup tables, recursively expanding ranges, filtering and parsing of mathematical expressions into MDX.
2. A method as claimed in claim 1 for generating ETL packages thatuses a computer to reproduce the business logic for report design stored in the source system.
3. A method as claimed in claim 2 which includes extracting data from a source system's reporting design system by expanding range hierarchies using expressions that have been stored within these designs and converting them into OLAP structures.
4. A method as in claim 1 where the user has the ability to add additional reporting functionality to the cube through creation of additional tables and business views.
5. A method as in claim 1 which transparently accounts for slowly changing dimensions by replacing a changed record and deleting the original or at the users instigation create a new record.
6. A method as in claim 4 where entities specified in the business view are automatically promulgated, by allowing the user to insert new expressions or filters.
7. A method as claimed in claim 2 in which the source databases are an ERP or CRM database.
8. A computer readable medium encoded with a data structure to examine the staging database metadata and constructs a schedule of operations to extract the data, transform it and load it into the cube by using translation blocks that facilitate lookup tables, recursively expanding ranges, filtering and parsing of mathematical expressions into MDX.
EP10740832.0A 2009-02-10 2010-02-09 Etl builder Withdrawn EP2396753A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2009900510A AU2009900510A0 (en) 2009-02-10 ETL Builder
PCT/AU2010/000135 WO2010091457A1 (en) 2009-02-10 2010-02-09 Etl builder

Publications (2)

Publication Number Publication Date
EP2396753A1 true EP2396753A1 (en) 2011-12-21
EP2396753A4 EP2396753A4 (en) 2014-05-07

Family

ID=42561315

Family Applications (1)

Application Number Title Priority Date Filing Date
EP10740832.0A Withdrawn EP2396753A4 (en) 2009-02-10 2010-02-09 Etl builder

Country Status (6)

Country Link
US (1) US20110320399A1 (en)
EP (1) EP2396753A4 (en)
CN (1) CN102349081A (en)
AU (1) AU2010213347A1 (en)
CA (1) CA2751384A1 (en)
WO (1) WO2010091457A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8041623B1 (en) 2002-11-26 2011-10-18 Trading Technologies International, Inc. Method and interface for historical display of market information
US8447721B2 (en) * 2011-07-07 2013-05-21 Platfora, Inc. Interest-driven business intelligence systems and methods of data analysis using interest-driven data pipelines
US10078685B1 (en) 2012-01-09 2018-09-18 W. C. Taylor, III Data gathering and data re-presentation tools
EP2852901A4 (en) * 2012-05-23 2015-05-13 Bi Builders As Methodology supported business intelligence (bi) software and system
US9928283B2 (en) * 2013-04-15 2018-03-27 Validus Solutions, Inc. Tracing data through a transformation process using tracer codes
US10216814B2 (en) * 2013-05-17 2019-02-26 Oracle International Corporation Supporting combination of flow based ETL and entity relationship based ETL
US9405523B2 (en) * 2013-11-04 2016-08-02 Bank Of America Corporation Automated build and deploy system
JP5847344B1 (en) * 2015-03-24 2016-01-20 株式会社ギックス Data processing system, data processing method, program, and computer storage medium
US11263600B2 (en) 2015-03-24 2022-03-01 4 S Technologies, LLC Automated trustee payments system
US20180300362A1 (en) * 2015-04-11 2018-10-18 Entit Software Llc Dimension data insertion into a dimension table
CN105279138B (en) * 2015-10-10 2018-05-22 苏州工业园区凌志软件股份有限公司 A kind of information research report automatic creation system
CN105512201A (en) * 2015-11-26 2016-04-20 晶赞广告(上海)有限公司 Data collection and processing method and device
FR3061574B1 (en) * 2016-12-30 2019-05-31 Orange METHOD AND DEVICE FOR EXTRACTING RELEVANT DATA IN THE EXECUTION OF A PROCESS
US11074267B2 (en) 2017-03-20 2021-07-27 Sap Se Staged approach to automatic data discovery and performance
US10732948B2 (en) * 2017-12-01 2020-08-04 Jpmorgan Chase Bank, N.A. System and method for implementing automated deployment
US11055310B2 (en) * 2017-12-04 2021-07-06 Bank Of America Corporation SQL server integration services (SSIS) package analyzer
CN111798311A (en) * 2020-07-22 2020-10-20 睿智合创(北京)科技有限公司 Bank risk analysis library platform based on big data, building method and readable medium
US11892977B2 (en) * 2021-10-14 2024-02-06 Vast Data Ltd. Stored entity metadata re-balancing upon storage expansion
US20230281213A1 (en) * 2022-02-03 2023-09-07 Datametica Solutions Private Limited System and method for data warehouse workload transformation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6604110B1 (en) * 2000-08-31 2003-08-05 Ascential Software, Inc. Automated software code generation from a metadata-based repository
US20070203933A1 (en) * 2006-02-24 2007-08-30 Iversen Heine K Method for generating data warehouses and OLAP cubes
WO2007095959A1 (en) * 2006-02-24 2007-08-30 Timextender A/S Method for generating data warehouses and olap cubes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080313628A1 (en) * 2007-06-15 2008-12-18 Microsoft Corporation Metrics pack distribution for data reporting tool

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6604110B1 (en) * 2000-08-31 2003-08-05 Ascential Software, Inc. Automated software code generation from a metadata-based repository
US20070203933A1 (en) * 2006-02-24 2007-08-30 Iversen Heine K Method for generating data warehouses and OLAP cubes
WO2007095959A1 (en) * 2006-02-24 2007-08-30 Timextender A/S Method for generating data warehouses and olap cubes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO2010091457A1 *

Also Published As

Publication number Publication date
US20110320399A1 (en) 2011-12-29
CA2751384A1 (en) 2010-08-19
EP2396753A4 (en) 2014-05-07
CN102349081A (en) 2012-02-08
AU2010213347A1 (en) 2011-08-25
WO2010091457A1 (en) 2010-08-19

Similar Documents

Publication Publication Date Title
US20110320399A1 (en) Etl builder
US20120005153A1 (en) Creation of a data store
Marco Building and managing the meta data repository
US7940899B2 (en) Fraud detection, risk analysis and compliance assessment
Mahanti Data quality: dimensions, measurement, strategy, management, and governance
US6189004B1 (en) Method and apparatus for creating a datamart and for creating a query structure for the datamart
US6212524B1 (en) Method and apparatus for creating and populating a datamart
US9830366B2 (en) Online analytic processing cube with time stamping
US9684703B2 (en) Method and apparatus for automatically creating a data warehouse and OLAP cube
US6161103A (en) Method and apparatus for creating aggregates for use in a datamart
US7117215B1 (en) Method and apparatus for transporting data for data warehousing applications that incorporates analytic data interface
US20130166515A1 (en) Generating validation rules for a data report based on profiling the data report in a data processing tool
US7739224B1 (en) Method and system for creating a well-formed database using semantic definitions
US20100131457A1 (en) Flattening multi-dimensional data sets into de-normalized form
US20080215621A1 (en) Metamodel-based automatic report generation
US20140188784A1 (en) Systems and methods for data-warehousing to facilitate advanced business analytic assessment
US20140214753A1 (en) Systems and methods for multi-source data-warehousing
GB2413665A (en) A data warehouse and OLAP cube
US8280896B2 (en) Reporting row structure for generating reports using focus areas
US7461076B1 (en) Method and apparatus for creating a well-formed database system using a computer
Jukic et al. Expediting analytical databases with columnar approach
US11829950B2 (en) Financial documents examination methods and systems
Che et al. Application and research on business intelligence in audit business
EP1374090A2 (en) Computer method and device for transporting data
Marques PRESENTING BUSINESS INSIGHTS ON ADVANCED PRICING AGREEMENTS USING A BUSINESS INTELLIGENCE FRAMEWORK

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20110805

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

RIN1 Information on inventor provided before grant (corrected)

Inventor name: WILSON, JAMES, HENRY

Inventor name: LIN, FRANK, CHI-JAO

Inventor name: LEDWICH, MARK, JOSEPH

DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20140408

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 17/30 20060101ALI20140402BHEP

Ipc: G06Q 10/00 20120101AFI20140402BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20140902