WO2017143405A1 - A data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository - Google Patents

A data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository Download PDF

Info

Publication number
WO2017143405A1
WO2017143405A1 PCT/AU2017/050166 AU2017050166W WO2017143405A1 WO 2017143405 A1 WO2017143405 A1 WO 2017143405A1 AU 2017050166 W AU2017050166 W AU 2017050166W WO 2017143405 A1 WO2017143405 A1 WO 2017143405A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
fact
storing
partition
partitions
Prior art date
Application number
PCT/AU2017/050166
Other languages
English (en)
French (fr)
Inventor
Vaughan NOTHNAGEL
Original Assignee
Cryspintel Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2016900704A external-priority patent/AU2016900704A0/en
Application filed by Cryspintel Pty Ltd filed Critical Cryspintel Pty Ltd
Priority to JP2018544361A priority Critical patent/JP7051108B2/ja
Priority to EP17755658.6A priority patent/EP3403200A4/en
Priority to AU2017224831A priority patent/AU2017224831B2/en
Priority to CN201780013415.4A priority patent/CN108701154B/zh
Priority to US16/078,603 priority patent/US11372880B2/en
Priority to SG11201806825RA priority patent/SG11201806825RA/en
Publication of WO2017143405A1 publication Critical patent/WO2017143405A1/en
Priority to HK18114174.9A priority patent/HK1255050A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/087Inventory or stock management, e.g. order filling, procurement or balancing against orders

Definitions

  • a data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository
  • the present invention relates generally to data warehousing and in particular to a data source system agnostic fact category partitioned information repository and associated methods for the insertion and retrieval of data using the information repository.
  • the main approaches to data warehousing comprise the dimensional approach and the normalized approach.
  • the prior art data dimensional data warehouse 26 comprises an associated subject area 27.
  • a sales transaction can be broken up into facts 28 such as the number of products ordered and the price paid for the products, and into dimensions 29 such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.
  • a key advantage of a dimensional approach is that the data warehouse is easier for the user to understand and to use. Also, the retrieval of data from the data warehouse tends to operate very quickly. Dimensional structures are easy to understand for business users, because the structure is divided into measurements/facts and context/dimensions. Facts are related to the organisation's business processes and operational system whereas the dimensions surrounding them provide context. Another advantage offered by the dimensional model is that it does not always require relational database queries. Thus, this type of modelling technique is very useful for end-user queries.
  • the present invention seeks to provide a static and configurable data warehouse structure, which will overcome or substantially ameliorate at least some, if not all, of the deficiencies of the prior art, or to at least provide a sustainable alternative.
  • the present invention relates to a data repository that is both transactional data and data source system agnostic.
  • the present data repository 8 is divided into particular fact partitions 16 (categorised by the type or category of data, rather than subject area) and particular customisable dimensions stored in relation to the fact partitions 16.
  • mappings/plugins 19 are utilised to map/translate data received from different data source systems into an appropriate format for storage within the particular fact partitions and associated dimensions.
  • the present data repository can be used for receiving and sharing data from differing operational and other source systems, without modification to the underlying fact partitions, existing reporting, analytical processes and dimensions.
  • additional contextual information may be stored within the present data repository by adding columns to the relevant shared dimensions database table or tables.
  • data may be received from a plurality of data source systems (such as human resources, payroll, e-commerce, retail, production warehousing, inventory control and other types of operational systems) without necessitating the inclusion of additional fact tables.
  • data source systems such as human resources, payroll, e-commerce, retail, production warehousing, inventory control and other types of operational systems
  • source system specific mappings may be used to map/decompose/partition the data received from each different operational system into the common repository format for the underlying fact partitions and dimensions.
  • the data may be used as is within the repository, or alternatively "reverse-mapped” or “reconstructed” using relevant source system specific mappings to map the data stored within the underlying fact partitions and dimensions into data appropriate for each operational system.
  • present embodiments may use a publish/push instead of pull method to allow execution of data transfer at the time of the business transaction.
  • present fact partitioned repository 8 may accept all new data streams with no schema redesign.
  • the present fact partitioned repository 8 negates the need for addition of new fact types when new business constructs are added.
  • the present fact partitioned repository structure allows for a static load processes, which means that only the data extraction from the repository may need configuring thereby avoiding staging during data input wherein records are cleansed at source.
  • the present fact partitioned repository provides granular, deconstructed and reliable (direct from source) data thereby eliminating the need for data marts/cubes. As the data is deconstructed during the PTL process into the relevant fact partitions and associated dimensions directly from its "source of truth", there is no requirement for a data mart or cube between the repository and the analytics tools in that the present repository itself becomes that data mart or cube. The granularity of the data also facilitates an "apples to apples" comparison, and delivers consistency and reliability.
  • US 2010/0070421 Al discloses a data warehouse system for managing performance of organisations.
  • the data warehouse system of Dl comprises a data model for storing data representing dimensions and measures applicable for multiple organisations, and a configuration unit for setting the placeholders such that the data model represents the particular organisation. This constitutes a single fact definition used in the same context across many organisations and not a multi-category collection of fact structures.
  • Dl is directed to providing a particular database structure for recording operational performance and is therefore not directed to the problem of providing operational system agnostic storage of data for reporting or analytical purpose.
  • the data warehouse record is clearly organised by the underlying business function, or "subject area” (e.g. "sales analysis”) rather than the data type, or category.
  • US 2009/0271345 Al discloses a data warehouse that is constructed using the relational mapping of a transactional database without reconstructing the data relationships of the transactional database.
  • an application programmer analyses an object model in order to describe facts and dimensions using the objects, attributes, and paths of the object model. Each of the dimensions has an identifier that correlates an item in the transactional database to a dimension record in the data warehouse. The fact and dimension descriptions are saved to a description file.
  • a Data Warehouse Engine then access the description file and uses the object model, fact and dimension descriptions, and object-relational mapping to map transactional data to the data warehouse. This constitutes a single fact category definition for singular data warehousing retention and analysis.
  • D2 cites singular mapping of the source transaction by object definition, effectively a singular "subject area" associated to a single transaction type that has no scope beyond the current object type nor is there any ability to deliver flexibility of transactional data context since the object definition is static.
  • US 2010/0106747 Al HONZAL et al.
  • D3 discloses populating data marts with dimensional data models from a set of data repositories that contain factual and association information about a set of related assets are disclosed.
  • An intermediate data warehouse is generated to process the facts and associations for each asset.
  • one or more data marts are generated with fact tables, dimensions, and hierarchies to fully model the information available for each asset.
  • D3 indicates a 'staging' relational database applicable to data that is already in a data warehouse repository or similar (specifically referred to in D3 as an "intermediate data warehouse"). Such staging occurs prior to entering the cubed form and serves to clearly articulate the current data warehouse static dimensions related to one information type only.
  • US 2003/0233297 Al discloses a transaction-related dimension of tax-related data for generating fact details to facilitate payment of taxes.
  • a transaction-related dimension of tax-related data is provided along with a plurality of attributes for the transaction- related dimension.
  • attributes include a transaction line item determined based on a transaction identifier, a transaction type, a tax type, a customer account identifier, a sold to location geographic code, a ship to location geographic code, a contract number, a purchase order number, a vendor account identifier, and a vendor zip code.
  • a plurality of entries are received which are associated with the attributes of the transaction-related dimension.
  • a plurality of fact details are then generated using the entries of a predetermined set of the attributes of the transaction-related dimension. Thereafter, the fact details are outputted.
  • D4 represents a single business type, or subject area (Taxation) and has no associated capability to store transactions from other business types (e.g. Mining and Manufacturing), also, there is no context capability beyond the single taxation model in the financial services so such items as a mortgage or credit agreement has no place in this model.
  • US 2008/0120129 Al (hereafter "D5") discloses a business object model, which reflects data that is used during a given business transaction, utilised to generate interfaces.
  • This business object model facilitates commercial transactions by providing consistent interfaces that are suitable for use across industries, across businesses, and across different departments within a business during a business transaction.
  • D6 distinctly describes a normal data warehouse whose user perspective of the subject area specific data (rather than the categorisation of the data) are made more usable by redefinition in a view, as opposed to any transactional decomposition.
  • the data source system agnostic information repository of the present embodiments differs from the dimensional approach and the normalised approach of the prior art describe above.
  • the data source system agnostic information repository of present embodiments may be thought of as being more related to the prior art dimensional approach than the normalised approach
  • the data source system agnostic information repository of present embodiments differs primarily from the prior art dimensional approach in that facts, in accordance with present embodiments, are partitioned in accordance with data category or type, rather than subject area.
  • prior art dimensional facts are normally organised by subject area, rather than data type or category.
  • the specific data categories disclosed herein allow for a "universally descriptive" data source system agnostic information repository being able to store transactional, and other data relating to most, if not all, possible scenarios of the physical or logical world thereby overcoming problem of the conventional dimensional approach arrangements where it is difficult to modify the data source system agnostic information repository if the organisation adopting the dimensional approach changes business process, or wishes to introduce data from additional data source systems.
  • the data source system agnostic information repository further confers further technical advantages in the simplified and efficient retrieval of data stored within the data source system agnostic information repository.
  • the data source system agnostic information repository comprises dimensional commonality between the partitioned data categories by utilising specific dimension types, simplifying the data source system agnostic information repository resulting in simplified and efficient insertion and select queries and the like.
  • a data source system agnostic fact partitioned data information repository system comprising: a data repository comprising: a plurality of fact partitions; a plurality of dimensions stored in relation to the fact partitions, the plurality of dimensions shared by each of the fact partitions; and a plurality of data source system specific data mappings; a data receiver for receiving data from the plurality of data source systems; and a data mapper for partitioning the data into the plurality of fact partitions using the plurality of data source system specific data mappings.
  • the plurality of fact partitions may comprise an event fact partition for storing an event occurrence.
  • the plurality of fact partitions may comprise a quantity fact partition for storing a quantity.
  • the plurality of fact partitions may comprise a monetary fact partition for storing a monetary amount.
  • the plurality of fact partitions may comprise a GIS fact partition for storing a GIS location.
  • the plurality of fact partitions may comprise a percentile fact partition for storing a percentile value.
  • the plurality of fact partitions may comprise a reference fact partition for storing a reference value.
  • the plurality of fact partitions may comprise an unstructured fact partition for storing a link to unstructured data stored either within the data warehouse or in a different location.
  • a least one fact partition data types may be at least two fact partition data categories and wherein storing the at least two fact partition data types may comprise storing the at least two fact partition data types in at least two of the fact partitions with each comprising a timestamp value; and wherein retrieving data from the repository may comprise joining the at least two fact partition data types using by timestamp value in order to re-construct a source transaction.
  • the plurality of dimensions may comprise a product dimension capable of storing product-related data.
  • the plurality of dimensions may comprise an asset dimension capable of storing asset-related data.
  • the plurality of dimensions may comprise a location dimension capable of storing location-related data.
  • the plurality of dimensions may comprise at least one of either physical or logical location-related data.
  • the plurality of dimensions may comprise an entity dimension capable of storing entity- related data.
  • a data source system agnostic fact category partitioned data information repository system comprising: a data repository comprising: a plurality of fact partitions comprising: an event fact partition for storing an event; a quantity fact partition for storing a quantity; a monetary fact partition for storing a monetary amount; a GIS fact partition for storing a GIS location; a percentile fact partition for storing a percentile value; and a reference fact partition for storing a reference value; a plurality of dimensions stored in relation to the fact partitions, the plurality of dimensions shared by each of the fact partitions, the plurality of dimensions comprising: a product dimension capable of storing product-related data; an asset dimension capable of storing asset-related data; a location dimension capable of storing location-related data; and an entity dimension capable of storing entity-related data; and a plurality of data source system specific data mappings; a data receiver for receiving data from the plurality of data source systems; and a data map
  • a method for storing data within a data source system agnostic information repository system comprising an data repository comprising: a plurality of fact partitions, partitioned by fact partition data types; a plurality of dimensions stored in relation to the fact partitions, the plurality of dimensions shared by each of the fact partitions, the method comprising: receiving data; partitioning the data into at least one fact partition data type; storing the at least one fact partition data type in at least one of the plurality of fact partitions; generating dimensional data; and storing the dimensional data in at least one of the plurality of dimensions in relation to the at least one of the plurality of fact partitions.
  • the data may be received from at least two data sources and wherein the partitioning may comprise partitioning the data by data source.
  • Figure 1 shows a prior art dimensional approach data warehouse
  • Figure 2 shows a data source system agnostic repository in accordance with an embodiment
  • Figure 3 further shows the data source system agnostic information repository in accordance with an embodiment
  • Figure 4 shows an exemplary scenario wherein the data source system agnostic information repository of Figure 3 is applied to a product purchase event transaction;
  • Figure 5 shows an exemplary scenario wherein the data source system agnostic information repository of Figure 3 is applied to a delivery truck movement event transaction
  • Figure 6 shows an exemplary entity relationship diagram for the data source system agnostic information repository in accordance with an embodiment.
  • FIG 3 there is shown a system 1 comprising a data source system agnostic information repository 8.
  • the repository 8 comprises a plurality of particular fact-category partitions 16.
  • the fact partitions 16 are partitioned by partition data types (such as event, quantitative, monetary, percentile, location, reference and unstructured data link datatypes) as will be described in further detail below.
  • each fact partition comprises a fact category and an associated fact definition/description.
  • the repository 8 comprises a plurality of shared dimensions 9 stored in relation to the fact partitions 16.
  • Each partition 16 may have dimensional commonality.
  • each of the shared dimensions 9 is generally uniformly shared by each of the partitioned data types.
  • dimensional commonality simplifies the structure of the repository 8 in that the repository 8 may be implemented by a finite number of tables, thereby simplifying insert and select queries. Furthermore, if business process customisation is required, additional columns may be utilised within the dimensional tables as in lieu of requiring new tables as is the case for the normalised approach.
  • storing data within the repository 8 comprises the system 1 receiving data 20 from source systems 21.
  • the source systems 21 may be Enterprise Resource Planning (ERP) Point of Sale (PoS), Inventory Management, Logistics or Customer Relationship (CRM) type systems belonging to differing operational systems such as HR, operations, accounting and the like.
  • ERP Enterprise Resource Planning
  • PoS Point of Sale
  • CRM Customer Relationship
  • system 1 may be configured with various plug-ins 19 (Such as, for example, FTP Files, Integration Middleware Transactions or Data files) for retrieving and processing the data.
  • plug-ins 19 Such as, for example, FTP Files, Integration Middleware Transactions or Data files
  • the plug-ins 19 may comprise a data receiver module 18 configured to fetch data from various source data systems 21.
  • a plug-in 19 may be a plug-in to the SAP Plant Maintenance (SAP PM) source ERP module.
  • SAP PM SAP Plant Maintenance
  • the data receiver module 18 may be an Enterprise Service Bus (ESB) receiver listening for transactional events in substantial real-time. In alternative embodiments, the data receiver module 18 may fetch or receive data at periodic intervals.
  • ESD Enterprise Service Bus
  • the plug-ins 19 may further comprise a data mapping module 24 configured to map the received data into the various fact partitions 16 (and, in embodiments, the shared dimensions 9) in accordance with the specific data source system 21.
  • the data mapping module 24 may utilise a plurality of data source system specific data mappings 17 for mapping the data per fact category specifically for differing types of data source systems 21.
  • Data mapping differs from the regular data warehousing approaches in that the load process is tightly coupled to the database structures 9 and 16 and holds no direct coupling to the source system transaction 20 thereby making the repository data source agnostic. Similarly, the data source system may change for the production of the originating data without the data warehouse needing to change in any form. This static nature of the repository ensures that robustness of structure remains and no data ever becomes obsolete even if the business changes.
  • the mapping for this "Load" process is the component that is unique to the client for their informational need.
  • the repository 8 utilises fact partitions 16 partitioned by specific partitioned data categories.
  • the specific fact partition data categories confer "universal type descriptor capabilities" to the repository 8 in being able to store data in relation to most, if not all, conceivable scenarios from potentially any number of data source systems 21.
  • the fact partitions 16 comprise all of the types of fact partitions as shown in Figure 3. However, in certain embodiments, and albeit with potential limitation to the differing types of transactions that may be stored within the repository 8 (which may not be an issue for particular data source systems and associated transactions).
  • the monetary fact partition 12 may be omitted for data source systems 21 not dealing with monetary based transactions.
  • the fact partitions 16 may comprises an event fact partition 10.
  • the event fact partition 10 may store an event type such as a purchase event, the delivery event, a hire of an employee event, a vehicle repair event, a child's birth event and other types of events that are by nature singular in their representation.
  • an event type such as a purchase event, the delivery event, a hire of an employee event, a vehicle repair event, a child's birth event and other types of events that are by nature singular in their representation.
  • the event data category may comprise an enumeration data category.
  • the fact partitions 16 may further comprises a quantitative fact partition 11 capable of storing a quantity.
  • the quantitative fact partition 11 may comprise a numeric value.
  • the quantity may represent a number of units sold and therefore comprise an integer data type.
  • the quantity may represent a weight of goods received and therefore comprise a floating point data type.
  • the fact partitions 16 may further comprise a monetary fact partition 12 capable of storing a monetary amount.
  • the monetary fact partition 12 may comprise a numeric value.
  • the monetary amount may be that a product was bought for $8.25 and sold for $10.59.
  • the monetary fact data category may be a floating point data type or the like capable of storing the monetary amounts to at least two decimal places and potentially more.
  • the fact partitions 16 may further comprises a GIS fact partition 13 capable of storing a GIS location.
  • the GIS fact partition 13 may store the fact that an asset is currently located at a particular location.
  • the GIS fact partition 17 data category may comprise a struct data type comprising two floating point data types so as to be able to represent latitude and longitude.
  • the fact partitions 16 may further comprises a percentile fact partition 14 capable of storing a percentile value.
  • the percentile fact partition 14 may store a value-added tax (VAT) percentile value.
  • VAT value-added tax
  • the percentile fact partition 14 may store a numeric data type, such as an integer, floating point value or the like.
  • the fact partitions 16 may further comprises a reference fact partition 15 capable of storing reference values.
  • the reference fact partition 15 is used for storing references, such as invoice numbers, part numbers and the like.
  • the reference fact partition 15 may, for example, utilise a Varchar data type capable of storing both string and numeric data values.
  • the fact partition 16 may further comprise an unstructured data fact 15 which may be utilised for storing links to unstructured data.
  • the unstructured data may represent a URL or other resource locator locating particular legal documents.
  • the shared dimensions 9 are common to each of the fact partitions 16 albeit that specific fact categories may not require association to every one of the dimensions.
  • the product dimension 3 may be shared by each of the fact partitions 16.
  • the repository 8 may record any of that 1) a product type was sold; 2) three products were sold; 3) three products were sold for $10.59; 4) three products were sold for $10.59 at a particular location; 5) three products were sold for $10.59 (excluding VAT at 10%) at a particular location; 6) three products were sold for $10.59 (excluding VAT at 10%) at a particular location with sale reference "SAL-13262” and 7) three products were sold for $10.59 (excluding VAT at 10%) at a particular location with sale reference "SAL-13262" having a PDF receipt accessible using a particular URL.
  • the data source system agnostic information repository 8 comprises all of the shared dimensions 9 as shown in Figure 3.
  • a subset context descriptive ability of the repository 8 a subset of the shared dimensions 9 may be employed (which may not be problematic for certain types of data source systems 21 dealing only in certain types of transactions).
  • the shared dimensions 9 may comprise a product dimension 3.
  • the product dimension 3 may store information relating to commercial products (and services). For example, iron ore grades, construction element types (Concrete, reinforcing bar etc.), Banking Account types, Retail items, Cars, Schools, medical procedures or any collective grouping association for any of the fact partitions.
  • the shared dimensions 9 may further comprise an asset dimension 3 configured for storing data relating to various assets.
  • asset dimension 3 configured for storing data relating to various assets.
  • a vehicle from the logistics system a crane, a crusher, an x-ray machine, a watch, a mobile phone, a projector or laptop, in essence, any tangible item that may or may not have business or personal value associated.
  • the shared dimensions 9 may further comprise a location dimension 23 for storing location information such as physical or logical information.
  • the shared dimensions 9 may further comprise an entity dimension 5 for storing information in relation to various entities.
  • the entities dimension 5 may further comprise person and company information (not shown) for storing information specific to persons and companies.
  • the shared dimensions 9 may further comprise additional dimensions 6 and 7 for storing further as-yet unspecified dimension types. These will be specific per organisations need and are included in the design to permit unique extension of the model for a specific requirement.
  • a join select query is employed utilising the unique timestamp data to re-assemble the relevant data if and when required.
  • the specific timestamp data column within each of the fact partitions 16 may be configured as being unique.
  • FIG 4 there is shown an exemplary application of the repository 8 for storing a product purchase event transaction.
  • the product purchase event transaction may be initially recorded by an e-commerce data source system 21 which may be retrieved, at periodic intervals, within other e-commerce transactional data using the data receiver 18.
  • the data mapping 24 maps the received transactional data 20 into the relevant fact partitions 16 and links this to the shared dimensions 9 utilising the data source system specific data mappings 17.
  • the customer buying a widget may be resolved into a purchase event transaction by the source system 21 and then mapped into the event partition table 10 representing a purchase event and linked to the product shared dimension table 3 identifying the widget.
  • the order quantity may be stored within the quantitative fact 11 partition, the unit price may be stored in the monetary fact partition 12 and the 10% tax may be stored in the percentile fact partition 14.
  • the unique timestamp which may be obtained from the data itself, or generated from the system clock for example, is stored within each of the fact partitions so as to allow the subsequent retrieval therefrom utilising a join select statement.
  • the associated order quantity, unit price and tax amount may be retrieved from the qualitative fact partition 11, monetary fact partition 12 and percentile fact partition 14 utilising the same timestamp.
  • FIG 5 there is shown a yet further exemplary application of the repository 8 for storing a delivery truck movement event.
  • a delivery truck moves from a first location (comprising latitude and longitude coordinates) at a first time to a second location at a second time.
  • the data may be partitioned into the event fact partition 10 and the GIS fact partition 13.
  • the vehicle ID, cargo and customer number may be stored as additional organisational specific context by configuring additional columns in the shared dimensions 9 tables linked by foreign key to the event fact partition 10 and the GIS fact partition 13.
  • the first and second timestamps corresponding to the first and second locations are also stored within the GIS fact partition 13.
  • FIG 6 there is shown an exemplary entity relationship diagram of the repository 8 wherein, for the represented entity relationships, 1 represents a single record, 0..1 represents zero or one records, 1..1 represents a one-to-one relationship and 1..* represents a one-to-many relationship.
  • categorised fact partition categories are shown in rectangles having dashed lines with straight corners having respective associated transactional/descriptive facts shown in rectangles having solid lines and rounded corners.
  • dimension enrichment is shown in rectangles with solid lines having straight edges wherein, for example, the entity dimension 5 is shown as potentially having an associated entity type representing a person or a company entity.
  • joining lines represent foreign key relationships and adjoining lines having arrowheads represent parent-child relationships.
  • the below exemplary technical scenario is implemented on a Microsoft SQL ® 2016 database using Microsoft .Net programming languages and XML structures.
  • the same may be implemented using any relational database (e.g. Oracle, DB2, MySql) or Columnar Database (e.g. NoSQL, MongoDB), any programming language (e.g. Python, Java etc.) and any message structure technology (e.g. Html, JSON etc.).
  • relational database e.g. Oracle, DB2, MySql
  • Columnar Database e.g. NoSQL, MongoDB
  • any programming language e.g. Python, Java etc.
  • message structure technology e.g. Html, JSON etc.
  • point of sale solution is an Oracle based transactional solution with specific transactional reference to value, product, quantity and in some cases customer information of rewards based customers.
  • SQL database tables are designed and built to represent dimensions 9.
  • Product is loaded via a Product Load .Net Program from an explicitly designed XML structure aligned to the Product dimension table.
  • the product master source provides the complete list of products for the client in a spreadsheet, this is parsed into the Product XML Structure and the Product
  • Location is loaded via a Location Load .Net Program from an explicitly designed XML structure aligned to the Location dimension table.
  • the location master source provides the complete list of locations both logical and physical of relevance to the client (Outlet, Division, Department, Building) in a client spreadsheet, this is parsed into the Location XM L and the Location Dimension load initiated;
  • Entity is loaded via an Entity Load .Net Program from an explicitly designed XM L structure aligned to the Entity dimension table.
  • the entity master source provides the complete list of customers of relevance to the client from the rewards program file in a client spreadsheet, this is parsed into the Entity XML and the Entity Dimension load initiated;
  • the source database is a normalised database to 3rd normal form with more than 300 tables in the operational database.
  • a Database Trigger is built into the Oracle Database on the primary transaction table to initiate a custom built stored procedure.
  • the custom stored procedure 19 creates multiple XML records based on the user's documented requirement as follows: - a.
  • a "Sale” Quantitative Partition Fact in XML format with full key value pair reference to the Product (Product Dimension 3), the quantity of the product in the sale transaction (Quantitative Fact 11), the date and time of the transaction (Time Stamp), the sale transaction customer (Product Dimension 3) and the location record (Location Dimension 23) that represents the retail outlet;
  • the client uses a MicroStrategy Business Intelligence mining tool to surface the stored data in either a report or dashboard/graphical format.
  • the financial system of record is a cloud based MYOB solution with Web Service API interfaces for extraction of financial transactions and other detail from the system of record.
  • a file listener is triggered by the arrival of the output file in this example (if there are any records found otherwise nothing is triggered) and it in turn initiates a 'Procurement Mapping' .Net program which maps the "Purchase" information into multiple records as below:
  • the client uses a MicroStrategy Business Intelligence mining tool to surface the stored data in either a report or dashboard/graphical format. Note that there are now 2 transaction types loaded into the same structure, that allows for greater analysis and reporting versatility.
  • HR human resources
  • the HR system of record is an on premise installed solution using SQL Server database technology behind a Windows ® Forms application front-end.
  • a custom HR source system Stored Procedure is built to create output records for each person from the system based on the last updated timestamp in the source system.
  • the program runs every hour as a SQL Scheduled Task, extracting Human Resource data and mapping it into the existing organisations Entity dimension XML structure.
  • A.Net matching program on Given Name, Surname and Date of birth is added to the existing Entity dimension load .Net Program to ensure duplicates are not created inadvertently.
  • Extracted data in the modified Entity XM L structure (employee ID was added to initial framework definitions) is mapped into the XML structure and after checking for duplicates on an "Insert" SQL instruction, records are either "Created”, “Updated” or “Deleted” (Logical record de-activation only) as per the update checks added.
  • a file listener is triggered by the arrival of the output file in this example (if there are any records found otherwise nothing is triggered for Sundays) and it in turn initiates a 'Timesheet Mapping' .Net program which maps the "Timesheet" information into multiple records as below:
  • the client uses a MicroStrategy Business Intelligence mining tool to surface the stored data in either a report or dashboard/graphical format. Note that there are now 4 types loaded in the same repository structure, that allows for greater analysis and reporting versatility.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Small-Scale Networks (AREA)
PCT/AU2017/050166 2016-02-26 2017-02-24 A data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository WO2017143405A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
JP2018544361A JP7051108B2 (ja) 2016-02-26 2017-02-24 データソースシステムに不可知のファクトカテゴリパーティション化情報リポジトリ及び情報リポジトリを用いてデータを挿入及び検索するための方法
EP17755658.6A EP3403200A4 (en) 2016-02-26 2017-02-24 DATA SOURCE CATEGORY OF AGNOSTIC FACTS OF DATA SOURCE SYSTEM AND METHODS FOR INSERTING AND RETRIEVING DATA USING THE INFORMATION REFERENTIAL
AU2017224831A AU2017224831B2 (en) 2016-02-26 2017-02-24 A data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository
CN201780013415.4A CN108701154B (zh) 2016-02-26 2017-02-24 数据源系统不可知的事实类别分区信息存储库以及用于使用信息存储库插入和检索数据的方法
US16/078,603 US11372880B2 (en) 2016-02-26 2017-02-24 Data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository
SG11201806825RA SG11201806825RA (en) 2016-02-26 2017-02-24 A data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository
HK18114174.9A HK1255050A1 (zh) 2016-02-26 2018-11-07 數據源系統不可知的事實類別分區信息存儲庫以及用於使用信息存儲庫插入和檢索數據的方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2016900704 2016-02-26
AU2016900704A AU2016900704A0 (en) 2016-02-26 A subject agnostic fact data type partitioned data warehouse structure and methods for the insertion and retrieval of data using the data warehouse structure

Publications (1)

Publication Number Publication Date
WO2017143405A1 true WO2017143405A1 (en) 2017-08-31

Family

ID=59684681

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2017/050166 WO2017143405A1 (en) 2016-02-26 2017-02-24 A data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository

Country Status (8)

Country Link
US (1) US11372880B2 (ja)
EP (1) EP3403200A4 (ja)
JP (1) JP7051108B2 (ja)
CN (1) CN108701154B (ja)
AU (1) AU2017224831B2 (ja)
HK (1) HK1255050A1 (ja)
SG (1) SG11201806825RA (ja)
WO (1) WO2017143405A1 (ja)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11163742B2 (en) * 2019-01-10 2021-11-02 Microsoft Technology Licensing, Llc System and method for generating in-memory tabular model databases
CN110674130A (zh) * 2019-08-30 2020-01-10 深圳鸿智云创科技有限公司 数据传输方法
CN111241121A (zh) * 2019-12-30 2020-06-05 航天信息(山东)科技有限公司 一种基于elasticsearch父子关系的海量发票数据查询方法及系统
CN111680506A (zh) * 2020-04-28 2020-09-18 北京三快在线科技有限公司 数据库表的外键映射方法、装置、电子设备和存储介质
CN112256745A (zh) * 2020-10-27 2021-01-22 武汉市钱鲸科技有限公司 一种零售数据分析方法
CN112328551A (zh) * 2020-11-09 2021-02-05 医渡云(北京)技术有限公司 医疗数据解析方法、装置、介质及电子设备
CN113297333A (zh) * 2021-03-17 2021-08-24 无锡极数宝大数据科技有限公司 数据处理方法、装置、服务器及存储介质
CN113350884B (zh) * 2021-06-04 2022-04-15 浙江斯普智能科技股份有限公司 二次精密过滤器

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233297A1 (en) 1999-08-31 2003-12-18 Accenture Properties (2) B.V. System, method and article of manufacture for organizing and managing transaction-related tax information
US20050033726A1 (en) 2003-05-19 2005-02-10 Ju Wu Apparatus and method for accessing diverse native data sources through a metadata interface
US20070239711A1 (en) 2006-03-28 2007-10-11 Stefan Unnebrink Mapping of a transactional data model to a reporting data model
US20080120129A1 (en) 2006-05-13 2008-05-22 Michael Seubert Consistent set of interfaces derived from a business object model
US20090271345A1 (en) 2008-04-25 2009-10-29 Lawrence Scott Rich Method and Apparatus for Declarative Data Warehouse Definition for Object-Relational Mapped Objects
US20100070421A1 (en) 2001-01-19 2010-03-18 International Business Machines Corporation Data warehouse system
US20100106747A1 (en) 2008-10-23 2010-04-29 Benjamin Honzal Dynamically building and populating data marts with data stored in repositories

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2355959A1 (en) * 2001-06-27 2002-12-27 Mapfusion Corp. Spatial business intelligence system
US7809678B2 (en) * 2004-07-09 2010-10-05 Microsoft Corporation Fact dimensions in multidimensional databases
CA2551199A1 (en) 2006-06-23 2007-12-23 Cognos Incorporated System and method of member unique names
EP2041676A4 (en) * 2006-06-26 2012-05-16 Nielsen Co Us Llc METHOD AND DEVICES FOR IMPROVING THE DATA WAREHOUSE EFFICIENCY
US8296336B2 (en) * 2008-05-02 2012-10-23 Oracle International Corp. Techniques for efficient dataloads into partitioned tables using swap tables
US8904381B2 (en) * 2009-01-23 2014-12-02 Hewlett-Packard Development Company, L.P. User defined data partitioning (UDP)—grouping of data based on computation model
US20110295795A1 (en) * 2010-05-28 2011-12-01 Oracle International Corporation System and method for enabling extract transform and load processes in a business intelligence server
US9298787B2 (en) 2011-11-09 2016-03-29 International Business Machines Corporation Star and snowflake schemas in extract, transform, load processes
US8676772B2 (en) 2011-12-09 2014-03-18 Telduráðgevin Sp/f Systems and methods for improving database performance
US9081875B2 (en) * 2011-12-30 2015-07-14 General Electric Company Systems and methods for organizing clinical data using models and frames
US9026562B2 (en) * 2012-10-05 2015-05-05 Hazeltree Fund Services, Inc. Methods and systems for agnostic data storage
US20140279658A1 (en) * 2013-03-12 2014-09-18 United Parcel Service Of America, Inc. Systems and methods of suggesting attended delivery/pickup locations
US20140278834A1 (en) * 2013-03-14 2014-09-18 Armchair Sports Productions Inc. Voting on actions for an event
US9613068B2 (en) * 2013-03-15 2017-04-04 Amazon Technologies, Inc. Scalable analysis platform for semi-structured data
US9519695B2 (en) * 2013-04-16 2016-12-13 Cognizant Technology Solutions India Pvt. Ltd. System and method for automating data warehousing processes
US9507838B2 (en) * 2013-05-17 2016-11-29 Oracle International Corporation Use of projector and selector component types for ETL map design
US9244978B2 (en) * 2014-06-11 2016-01-26 Oracle International Corporation Custom partitioning of a data stream
MY187720A (en) 2014-08-05 2021-10-14 Mimos Berhad Method for data input into a database
US10877995B2 (en) * 2014-08-14 2020-12-29 Intellicus Technologies Pvt. Ltd. Building a distributed dwarf cube using mapreduce technique
CN104462430B (zh) * 2014-12-12 2017-12-22 北京国双科技有限公司 关系型数据库的数据处理方法及装置
US9965514B2 (en) * 2014-12-19 2018-05-08 Software Ag Usa, Inc. Techniques for real-time generation of temporal comparative and superlative analytics in natural language for real-time dynamic data analytics
US11036752B2 (en) * 2015-07-06 2021-06-15 Oracle International Corporation Optimizing incremental loading of warehouse data
US20170017683A1 (en) * 2015-07-13 2017-01-19 28msec Systems And Methods For Storing And Interacting With Data From Heterogeneous Data Sources
US10354188B2 (en) * 2016-08-02 2019-07-16 Microsoft Technology Licensing, Llc Extracting facts from unstructured information

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233297A1 (en) 1999-08-31 2003-12-18 Accenture Properties (2) B.V. System, method and article of manufacture for organizing and managing transaction-related tax information
US20100070421A1 (en) 2001-01-19 2010-03-18 International Business Machines Corporation Data warehouse system
US20050033726A1 (en) 2003-05-19 2005-02-10 Ju Wu Apparatus and method for accessing diverse native data sources through a metadata interface
US20070239711A1 (en) 2006-03-28 2007-10-11 Stefan Unnebrink Mapping of a transactional data model to a reporting data model
US20080120129A1 (en) 2006-05-13 2008-05-22 Michael Seubert Consistent set of interfaces derived from a business object model
US20090271345A1 (en) 2008-04-25 2009-10-29 Lawrence Scott Rich Method and Apparatus for Declarative Data Warehouse Definition for Object-Relational Mapped Objects
US20100106747A1 (en) 2008-10-23 2010-04-29 Benjamin Honzal Dynamically building and populating data marts with data stored in repositories

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3403200A4

Also Published As

Publication number Publication date
HK1255050A1 (zh) 2019-08-02
JP2019506685A (ja) 2019-03-07
CN108701154A (zh) 2018-10-23
EP3403200A4 (en) 2019-12-25
SG11201806825RA (en) 2018-09-27
JP7051108B2 (ja) 2022-04-11
CN108701154B (zh) 2022-05-03
US20190050464A1 (en) 2019-02-14
US11372880B2 (en) 2022-06-28
AU2017224831A1 (en) 2018-08-30
AU2017224831B2 (en) 2023-01-05
EP3403200A1 (en) 2018-11-21

Similar Documents

Publication Publication Date Title
AU2017224831B2 (en) A data source system agnostic fact category partitioned information repository and methods for the insertion and retrieval of data using the information repository
US8041760B2 (en) Service oriented architecture for a loading function in a data integration platform
US7814470B2 (en) Multiple service bindings for a real time data integration service
US7814142B2 (en) User interface service for a services oriented architecture in a data integration platform
US8060553B2 (en) Service oriented architecture for a transformation function in a data integration platform
US20050232046A1 (en) Location-based real time data integration services
US20050262190A1 (en) Client side interface for real time data integration jobs
US20050240354A1 (en) Service oriented architecture for an extract function in a data integration platform
US20050228808A1 (en) Real time data integration services for health care information data integration
US20050262189A1 (en) Server-side application programming interface for a real time data integration service
US20050234969A1 (en) Services oriented architecture for handling metadata in a data integration platform
US20050235274A1 (en) Real time data integration for inventory management
US20050240592A1 (en) Real time data integration for supply chain management
US20050222931A1 (en) Real time data integration services for financial information data integration
US20110313969A1 (en) Updating historic data and real-time data in reports
US20050223109A1 (en) Data integration through a services oriented architecture
US20060010195A1 (en) Service oriented architecture for a message broker in a data integration platform
US20120030256A1 (en) Common Modeling of Data Access and Provisioning for Search, Query, Reporting and/or Analytics
JP2008511928A (ja) メタデータの管理
US20160259831A1 (en) Methodology supported business intelligence (BI) software and system
US20230177025A1 (en) Processes and systems for onboarding data for a digital duplicate
Weber et al. Database Systems
US11551464B2 (en) Line based matching of documents
Lake et al. A history of databases
Domdouzis et al. A History of Databases

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 11201806825R

Country of ref document: SG

WWE Wipo information: entry into national phase

Ref document number: 2017755658

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2018544361

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017755658

Country of ref document: EP

Effective date: 20180816

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017224831

Country of ref document: AU

Date of ref document: 20170224

Kind code of ref document: A