WO2016022019A1 - Method for data input into a database - Google Patents

Method for data input into a database Download PDF

Info

Publication number
WO2016022019A1
WO2016022019A1 PCT/MY2015/050077 MY2015050077W WO2016022019A1 WO 2016022019 A1 WO2016022019 A1 WO 2016022019A1 MY 2015050077 W MY2015050077 W MY 2015050077W WO 2016022019 A1 WO2016022019 A1 WO 2016022019A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
medium
metric
facts
data source
Prior art date
Application number
PCT/MY2015/050077
Other languages
French (fr)
Inventor
Boon Keong SEAH
Kwang Ming NG
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2016022019A1 publication Critical patent/WO2016022019A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Definitions

  • the present invention relates to a method for data input into a database, more particularly the present invention relates to a method for data input into a data warehouse.
  • a data warehouse allows an organization or company to create a consolidated view of its enterprise data, and using the data for reporting and data analysis.
  • a data warehouse can be used to analyse a particular subject area, integrate data from multiple data sources as well as store current and historical data. Creating a data warehouse requires mapping data between sources and targets, then capturing the details of the transformation in a metadata repository.
  • a data warehouse generally comprises numerous fact tables in order to cater for various business process.
  • Patent US 20090055439 Al mentions a method for dynamically adding dimensions specific to a tenant in a data warehouse without changing the structure of the fact table.
  • US 20090055439 Al does not teach of adding dimensions into multiple fact tables as well as restriction to data input as only a predefined number of columns is able to be added to the fact table.
  • Patent US 20120109917 Al teaches a method for maintaining a hierarchy bridge table consistent with a corresponding hierarchy table.
  • US 20120109917 Al supports only a single hierarchy table node.
  • a hierarchy bridge table must always be correct when compared to the corresponding hierarchy to prevent the system from being defective.
  • Patent US 8352458 B2 points out techniques for transforming records prior to loading the records into a data warehouse.
  • the system mentioned in US 8352458 B2 uses the dimension tables for adding foreign keys corresponding to dimension tables with data stored in a staging table before loading to a fact table.
  • the problem in the aforementioned prior arts is that the bridge table is unable to support data loading into multiple fact tables in the data warehouse without changing the structure of the fact table.
  • the present invention addresses this problem.
  • the present invention aims to provide a method for data input into a database whilst maintaining the structure of the metric of facts, known as fact tables.
  • It is an object of the present invention to provide a method for data input into a database comprising the steps of extraction of data from a data source, categorizing data in at least a first medium, providing at least a second medium, generating data mapping relationship from at least a first medium to at least a second medium, transforming said data to at least a metric of fields and loading a database with metric of facts from said at least a second medium. It is a further object of the present invention to provide a method for automation of dynamically adding new categories into multiple metric of facts with respect to at least a first medium for categorizing data .
  • the present invention provides a method for data input into a database for supporting multiple fact tables.
  • Figure 1 illustrates a method of data input into a database according to the present invention.
  • Figure 2 illustrates a graphical representation of a first medium according to the present invention.
  • Figure 3 illustrates a method for assigning surrogate keys in the first medium according to the present invention.
  • Figure 4 illustrates a graphical representation of a second medium according to the present invention.
  • Figure 5 illustrates a method for partitioning the second medium according to the present invention.
  • Figure 6 illustrates a method for updating assigned surrogate key to the second medium according to the present invention.
  • Figure 7 illustrates a method for updating predefined column in each metric of facts with assigned surrogate keys according to the present invention.
  • Figure 8 illustrates a method for adding of a predefined column to a metric of facts according to the present invention.
  • Figure 9 illustrates a flowchart for achieving dynamic dimension process according to the present invention.
  • data is populated into a fact table in a database or data warehouse by first extracting data from a data source (1).
  • This data in its natural key form will be assigned a surrogate key (9) representation.
  • the data extracted from the data source (1) will then be transformed into a metric of facts (4) or also known as fact tables through intermediate mediums or bridge tables such as dimension tables.
  • the measure is calculated and the aggregate value from multiple dimension tables will be loaded to fact tables in a data warehouse.
  • This method of data population is carried out by mapping the aggregate value from a first medium (2) to a predefined metric of facts (4) preceding to transforming data to at least a metric of facts (4), and is capable of maintaining the structure of the metric of facts.
  • data sources (1) are provided with new additional columns.
  • individual dimension tables will be generated to map the natural key with the newly assigned surrogate keys (9). Additional columns have to be created in the existing fact table to accommodate these additional fields. Therefore, changes to the existing extract, transform and load, ETL process from the data source (1) to the fact table are required.
  • the process of transforming data from the data source (1) to dimension tables and finally loading data into to the fact tables has to be repeated for each fact table in the data warehouse with a plurality of fact tables. If the fact tables have large data size, this process will be time consuming. Furthermore, frequent changes on the columns of fact tables and ETL process will cause adverse impact on the database design.
  • the abovementioned problems are solved with the introduction of the present invention.
  • the present invention enables automation of dynamically adding dimensions into multiple fact tables with respect to the data source (1) defined whilst retaining the structure of the fact table.
  • the present invention moreover discloses a method of data mapping that supports multiple fact tables using a prefixed field to reference the dynamic dimension table.
  • FIG. 1 the figure illustrates a method for data input into a database according to the present invention comprising the steps of extracting data from a data source (1), categorizing said data in at least a first medium (2), providing at least a second medium (3), generating data mapping relationship from at least a first medium (2) to at least a second medium (3), transforming said data to at least a metric of facts (4) and loading the database with at least a metric of facts (4) from at least a second medium (3).
  • Figure 2 illustrates at least a first medium (2) shown in Figure 1 wherein at least a first medium categorizes data from the data source (1) according to the present invention.
  • a key step of data input into the database is through categorizing or defining data by at least a first medium (2) in a form of fact definition table.
  • This said at least a first medium (2) or fact definition table has to be defined in order to facilitate the definition of one or more metric of facts (4) or fact tables to be added to the data warehouse.
  • Said at least a first medium (2) comprises a first plurality of fields (5) interlinking with one another for representing the data in categories (6).
  • Said at least a first medium (2) or fact definition table comprises at least a field for identification of the metric of facts (4) (i.e. "Fact ID” column), a field for naming of the metric of facts (4) (i.e. "Fact Name” column), a field indicating the number of categories (6) to be mapped from the data source (1) (i.e. "num_dimension” column), a field for naming of the granularity of data (7) (i.e. "dimension” column) and a field for identifying the data source (1) (i.e. "data source” column). For each of these fields in the fact definition table, a user is to fill in the data accordingly.
  • the abovementioned "Fact ID” uniquely identifies the ID of the fact table in the data warehouse while the "Fact Name” identifies the name of the fact table.
  • the number of categories (6) to be mapped from the data source (1) is defined by the "num_dimension” field.
  • the "dimension” field enables a user to define multiple dimension table name and the column name mapping to the data source (1) table.
  • the field name defined here has the same column name in the data source (1) table to enable matching mapping name.
  • the "data source” column identifies the name of the table of the data source (1).
  • FIG 3 the figure illustrates naming of the granularity of data (7) according to the present invention. The naming of the granularity of data (7) is registered under the "dimension” field mentioned above.
  • This step involves assigning surrogate key (9) to be represented in said at least a second medium (3) prior to be represented in a metric of facts (4).
  • a naming convention for the granularity of data (7) comprises a string of a plurality of category names and plurality of field names in the sequence of field name then category name as seen below:
  • Figure 4 illustrates at least a second medium (3) shown in Figure 1, comprising a second plurality of fields (8) interlinking with one another for generating data mapping relationship according to the present invention.
  • a second medium (3) is provided which generates data mapping relationship from said at least a first medium (2) to said at least a second medium (3).
  • Said at least a second medium (3) also referred to as dynamic dimension table will be defined to facilitate transforming of data to one or more fact tables.
  • the dynamic dimension table comprises a field for identification of the categories (6) (i.e. "DynamicDim ID” column), a field mapping the identification of the metric of facts (4) from at least a first medium (2) (i.e. "Fact ID” column), a field containing surrogate key of categories (6) obtained from at least a first medium (2) (i.e. "key value” column) and a field for identifying name of the data (i.e. "dimension name” column).
  • the "DynamicDim ID” identifies the ID of each dynamic dimension or category from the data source (1).
  • the "Fact ID” maps each fact table with the dynamic dimension table.
  • the "key value” contains the surrogate key (9) of the dimension table.
  • the surrogate key (9) is obtained from mapping of each element in the "dimension” column of the Fact definition table. Furthermore, 'dimension name' column identifies the dimension table name.
  • the figure illustrates a method for partitioning of said at least a second medium (3) based on numbers of each metric of facts (4) in the database according to the present invention.
  • the figure clearly shows that said at least a second medium (3) in that of dynamic dimension table is partitioned based on the number of fact tables in the data warehouse (i.e. "Fact ID column).
  • a predefined column (10) is added to map the surrogate key (9) of the dynamic dimension table.
  • the figure illustrates a method for the storing of mapping relationship from the data source (1) field to at least a second medium (3) through a first medium according to the present invention.
  • the figure shows multiple "data source” fields such as service type dimension and ethnic dimension are mapped and stored in "DynamicDim" table in the steps of:
  • FIG. 7 the figure illustrates a method for assigning of surrogate keys (9) from said at least a second medium (3) to at least a metric of facts (4) according to the present invention.
  • the multiple fact tables or metric of facts (4) are assigned with surrogate keys (9) utilizing the mapping of relationship between multiple data sources (1) and multiple dimension tables stored, where for each row of data read from data source (1), row > Numl
  • the figure illustrates a method for updating of a predefined "DynamicDim" column (10) added to existing fact table according to the present invention.
  • a default value of less than zero i.e. -1 is assigned to the predefined column (10) which means that when there is no value in the surrogate key (9) of the dynamic dimension, the function will not run.
  • FIG 9 the figure illustrates the steps for dynamically adding dimensions into a fact table in the form of a flowchart according to the present invention.
  • Data obtained from multiple sources will be categorized into at least a first medium (2), in this case a fact definition table for mapping of relationship between multiple data source (1), column names and multiple dimension table column name for each fact table (21). If the number of rows in the fact definition table exceeds zero (22), the subsequent process entails. Details or columns in the fact definition table will be read and a temporary variable is initialized to zero (23). The dynamic dimension table will be partitioned based on each fact table in the data warehouse (24). If numbers of dimension or fields exceeds the temporary variable, the process will loop (25).
  • mapping of the relationship between multiple data source (1) and multiple dimension table column will be stored through means of parsing the multiple dimension name, column name and data source (31) and then assigning temporary variables to them (32).
  • multiple fact tables will be assigned surrogate keys (9) utilizing the mapping of relationship between multiple data sources (1) and multiple dimension tables (41).
  • the assigned surrogate keys (9) will be updated to a column in the dynamic dimension table (42) for each metric of fact (4).
  • the data is then populated into the data base (1) through updating of each column in the metric of facts with assigned surrogate keys (43).
  • the process flow will be repeated if the data source (1) is larger than Numl (27).

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A database is a central source of data for query and analysis. Functions of a database include retrieving, analyzing, transforming, loading and managing data; making them available for further use. Disclosed herein is a method for improved data input into a database whilst maintaining the structure of metric of facts. Data is extracted from a data source (1), transformed and loaded into fact tables in a data warehouse by means of bridge tables. Automation of dynamically adding dimensions into multiple fact tables with respect to the data source (1) is disclosed in the present invention.

Description

METHOD FOR DATA INPUT INTO A DATABASE FIELD OF TECHNOLOGY The present invention relates to a method for data input into a database, more particularly the present invention relates to a method for data input into a data warehouse.
BACKGROUND OF THE INVENTION
A data warehouse allows an organization or company to create a consolidated view of its enterprise data, and using the data for reporting and data analysis. A data warehouse can be used to analyse a particular subject area, integrate data from multiple data sources as well as store current and historical data. Creating a data warehouse requires mapping data between sources and targets, then capturing the details of the transformation in a metadata repository.
One of the challenges faced is that frequent changes occurs on the management to analyse the data differently. For example, analysing the location of patient for a disease, distribution of bed in a hospital and patient history on top of data for the type of disease. Due to the fact that frequent new requests are made to meet new scope of data analysis, additional data or columns have to be added into the fact table. This affects significant process time if the fact table is huge. In addition, changes has to be made to the existing ETL (Extraction, Transformation and Loading) process of data transformation to fact table. Further compounding to the issue, a data warehouse generally comprises numerous fact tables in order to cater for various business process.
Patent US 20090055439 Al mentions a method for dynamically adding dimensions specific to a tenant in a data warehouse without changing the structure of the fact table. However, US 20090055439 Al does not teach of adding dimensions into multiple fact tables as well as restriction to data input as only a predefined number of columns is able to be added to the fact table.
Patent US 20120109917 Al teaches a method for maintaining a hierarchy bridge table consistent with a corresponding hierarchy table. However, US 20120109917 Al supports only a single hierarchy table node. Furthermore, a hierarchy bridge table must always be correct when compared to the corresponding hierarchy to prevent the system from being defective.
Patent US 8352458 B2 points out techniques for transforming records prior to loading the records into a data warehouse. The system mentioned in US 8352458 B2 uses the dimension tables for adding foreign keys corresponding to dimension tables with data stored in a staging table before loading to a fact table.
The problem in the aforementioned prior arts is that the bridge table is unable to support data loading into multiple fact tables in the data warehouse without changing the structure of the fact table. The present invention addresses this problem.
SUMMARY OF THE INVENTION The present invention aims to provide a method for data input into a database whilst maintaining the structure of the metric of facts, known as fact tables.
It is an object of the present invention to provide a method for data input into a database comprising the steps of extraction of data from a data source, categorizing data in at least a first medium, providing at least a second medium, generating data mapping relationship from at least a first medium to at least a second medium, transforming said data to at least a metric of fields and loading a database with metric of facts from said at least a second medium. It is a further object of the present invention to provide a method for automation of dynamically adding new categories into multiple metric of facts with respect to at least a first medium for categorizing data .
It is a further object of the invention to provide a method for supporting multiple metric of facts by having a prefix column in the metric of facts referencing to at least a second medium.
It is a further object of the present invention to provide a method for data input in order to avoid changes to existing extract, transform and load process of data transformation to the metric of facts. The present invention provides a method for data input into a database for supporting multiple fact tables.
BRIEF DESCRIPTION OF DRAWINGS
Figure 1 illustrates a method of data input into a database according to the present invention.
Figure 2 illustrates a graphical representation of a first medium according to the present invention.
Figure 3 illustrates a method for assigning surrogate keys in the first medium according to the present invention.
Figure 4 illustrates a graphical representation of a second medium according to the present invention.
Figure 5 illustrates a method for partitioning the second medium according to the present invention.
Figure 6 illustrates a method for updating assigned surrogate key to the second medium according to the present invention.
Figure 7 illustrates a method for updating predefined column in each metric of facts with assigned surrogate keys according to the present invention.
Figure 8 illustrates a method for adding of a predefined column to a metric of facts according to the present invention.
Figure 9 illustrates a flowchart for achieving dynamic dimension process according to the present invention. DETAILED DESCRIPTION OF THE EMBODIMENTS
Described below are various preferred embodiments of the present invention with reference to the accompanying drawings. Each of the following preferred embodiments describes an example in which a method of data input into a database may be implemented and the improvements over existing prior art. The configuration of the invention is not limited to the configuration mentioned in the following description.
It will be appreciated that data is populated into a fact table in a database or data warehouse by first extracting data from a data source (1). This data in its natural key form will be assigned a surrogate key (9) representation. The data extracted from the data source (1) will then be transformed into a metric of facts (4) or also known as fact tables through intermediate mediums or bridge tables such as dimension tables. For each row of data population, the measure is calculated and the aggregate value from multiple dimension tables will be loaded to fact tables in a data warehouse. This method of data population is carried out by mapping the aggregate value from a first medium (2) to a predefined metric of facts (4) preceding to transforming data to at least a metric of facts (4), and is capable of maintaining the structure of the metric of facts.
As demand increases by the management for more analysis on the data from the data warehouse, data sources (1) are provided with new additional columns. For each additional column or field in the data source (1), individual dimension tables will be generated to map the natural key with the newly assigned surrogate keys (9). Additional columns have to be created in the existing fact table to accommodate these additional fields. Therefore, changes to the existing extract, transform and load, ETL process from the data source (1) to the fact table are required. The process of transforming data from the data source (1) to dimension tables and finally loading data into to the fact tables has to be repeated for each fact table in the data warehouse with a plurality of fact tables. If the fact tables have large data size, this process will be time consuming. Furthermore, frequent changes on the columns of fact tables and ETL process will cause adverse impact on the database design. The abovementioned problems are solved with the introduction of the present invention. The present invention enables automation of dynamically adding dimensions into multiple fact tables with respect to the data source (1) defined whilst retaining the structure of the fact table. The present invention moreover discloses a method of data mapping that supports multiple fact tables using a prefixed field to reference the dynamic dimension table.
Referring to Figure 1, the figure illustrates a method for data input into a database according to the present invention comprising the steps of extracting data from a data source (1), categorizing said data in at least a first medium (2), providing at least a second medium (3), generating data mapping relationship from at least a first medium (2) to at least a second medium (3), transforming said data to at least a metric of facts (4) and loading the database with at least a metric of facts (4) from at least a second medium (3). Referring to Figure 1 and Figure 2, Figure 2 illustrates at least a first medium (2) shown in Figure 1 wherein at least a first medium categorizes data from the data source (1) according to the present invention. A key step of data input into the database is through categorizing or defining data by at least a first medium (2) in a form of fact definition table. This said at least a first medium (2) or fact definition table has to be defined in order to facilitate the definition of one or more metric of facts (4) or fact tables to be added to the data warehouse.
Said at least a first medium (2) comprises a first plurality of fields (5) interlinking with one another for representing the data in categories (6). Said at least a first medium (2) or fact definition table comprises at least a field for identification of the metric of facts (4) (i.e. "Fact ID" column), a field for naming of the metric of facts (4) (i.e. "Fact Name" column), a field indicating the number of categories (6) to be mapped from the data source (1) (i.e. "num_dimension" column), a field for naming of the granularity of data (7) (i.e. "dimension" column) and a field for identifying the data source (1) (i.e. "data source" column). For each of these fields in the fact definition table, a user is to fill in the data accordingly.
The abovementioned "Fact ID" uniquely identifies the ID of the fact table in the data warehouse while the "Fact Name" identifies the name of the fact table. The number of categories (6) to be mapped from the data source (1) is defined by the "num_dimension" field. The "dimension" field enables a user to define multiple dimension table name and the column name mapping to the data source (1) table. The field name defined here has the same column name in the data source (1) table to enable matching mapping name. The "data source" column identifies the name of the table of the data source (1). Referring to Figure 3, the figure illustrates naming of the granularity of data (7) according to the present invention. The naming of the granularity of data (7) is registered under the "dimension" field mentioned above. This step involves assigning surrogate key (9) to be represented in said at least a second medium (3) prior to be represented in a metric of facts (4). A naming convention for the granularity of data (7) comprises a string of a plurality of category names and plurality of field names in the sequence of field name then category name as seen below:
<Dimension Namel xColumn Separator Begin xColumn Namel xColumn Separator End xSeparatorxDimension Name2xColumn Separator Begin xColumn Name2xColumn Separator End >.. <SeparatorxDimension Name(n) >
An example of the above naming convention is:
Ethnic'Src Ethnic ID' 11 Service Type'Src Service Type' Referring to Figure 1 and Figure 4, Figure 4 illustrates at least a second medium (3) shown in Figure 1, comprising a second plurality of fields (8) interlinking with one another for generating data mapping relationship according to the present invention. Once the data has been categorized in at least a first medium (2) in that of the fact definition table, at least a second medium (3) is provided which generates data mapping relationship from said at least a first medium (2) to said at least a second medium (3). Said at least a second medium (3) also referred to as dynamic dimension table will be defined to facilitate transforming of data to one or more fact tables.
The dynamic dimension table comprises a field for identification of the categories (6) (i.e. "DynamicDim ID" column), a field mapping the identification of the metric of facts (4) from at least a first medium (2) (i.e. "Fact ID" column), a field containing surrogate key of categories (6) obtained from at least a first medium (2) (i.e. "key value" column) and a field for identifying name of the data (i.e. "dimension name" column).
The "DynamicDim ID" identifies the ID of each dynamic dimension or category from the data source (1). The "Fact ID" maps each fact table with the dynamic dimension table. The "key value" contains the surrogate key (9) of the dimension table. The surrogate key (9) is obtained from mapping of each element in the "dimension" column of the Fact definition table. Furthermore, 'dimension name' column identifies the dimension table name.
Referring to Figure 5, the figure illustrates a method for partitioning of said at least a second medium (3) based on numbers of each metric of facts (4) in the database according to the present invention. The figure clearly shows that said at least a second medium (3) in that of dynamic dimension table is partitioned based on the number of fact tables in the data warehouse (i.e. "Fact ID column). For the every fact table in the data warehouse, a predefined column (10) is added to map the surrogate key (9) of the dynamic dimension table.
Referring to Figure 6, the figure illustrates a method for the storing of mapping relationship from the data source (1) field to at least a second medium (3) through a first medium according to the present invention. The figure shows multiple "data source" fields such as service type dimension and ethnic dimension are mapped and stored in "DynamicDim" table in the steps of:
i. determining number of categories (6) or dimensions (num_dimension) ii. parsing at least a second medium (3) or each dimension table and the field of "dimension" column, followed by reading of table name in "data source" column.
iii. assigning temporary variables to dimension table name, column name and data source (1) name
iv. executing function Num=Num+ l
The table name of the data source (1) from the database will then be read, initializing Numl(Numl = 0) Referring to Figure 7, the figure illustrates a method for assigning of surrogate keys (9) from said at least a second medium (3) to at least a metric of facts (4) according to the present invention. The multiple fact tables or metric of facts (4) are assigned with surrogate keys (9) utilizing the mapping of relationship between multiple data sources (1) and multiple dimension tables stored, where for each row of data read from data source (1), row > Numl
i. assigning a surrogate key (9) to each unique combination of values based on each column name of dimension table name defined in the Fact Definition matching column name defined in the data source (1) ii. updating assigned surrogate key (9) to a field (ie: "Key Value" column) in the dynamic dimension table for each fact table
iii. updating a predefined column (10) (ie: "DynamicDim" column) in each fact table in the data warehouse with the assigned surrogate keys (9) of the dynamic dimension table
iv. executing Increment Numl =Numl + 1
Referring to figure 8, the figure illustrates a method for updating of a predefined "DynamicDim" column (10) added to existing fact table according to the present invention. A default value of less than zero (i.e. -1) is assigned to the predefined column (10) which means that when there is no value in the surrogate key (9) of the dynamic dimension, the function will not run.
Referring to Figure 9, the figure illustrates the steps for dynamically adding dimensions into a fact table in the form of a flowchart according to the present invention. Data obtained from multiple sources will be categorized into at least a first medium (2), in this case a fact definition table for mapping of relationship between multiple data source (1), column names and multiple dimension table column name for each fact table (21). If the number of rows in the fact definition table exceeds zero (22), the subsequent process entails. Details or columns in the fact definition table will be read and a temporary variable is initialized to zero (23). The dynamic dimension table will be partitioned based on each fact table in the data warehouse (24). If numbers of dimension or fields exceeds the temporary variable, the process will loop (25). The mapping of the relationship between multiple data source (1) and multiple dimension table column will be stored through means of parsing the multiple dimension name, column name and data source (31) and then assigning temporary variables to them (32). The table name of the data source (1) from the database will be read, initializing Numl (Numl =0) (26). Finally, multiple fact tables will be assigned surrogate keys (9) utilizing the mapping of relationship between multiple data sources (1) and multiple dimension tables (41). The assigned surrogate keys (9) will be updated to a column in the dynamic dimension table (42) for each metric of fact (4). The data is then populated into the data base (1) through updating of each column in the metric of facts with assigned surrogate keys (43). The process flow will be repeated if the data source (1) is larger than Numl (27).
In as much as the present invention is subject to many variations, modifications and changes in detail, it is intended that all matter contained in the foregoing description or shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.

Claims

A method for data input into a database comprising the steps of:
extracting data from a data source (1);
categorizing said data in at least a first medium (2);
providing at least a second medium (3);
generating data mapping relationship from at least a first medium
(2) to at least a second medium
(3);
transforming said data to at least a metric of facts (4); and
loading the database with said at least a metric of facts (4);
characterized in that the method further comprising a step of mapping a value from said at least a first medium (2) to a predefined metric in said at least a metric of facts (4) preceding to said step of transforming said data to at least a metric of facts
(4) which is capable of maintaining structure of the metric of facts.
The method of claim 1, wherein said at least a first medium (2) comprises a first plurality of fields (5) interlinking with one another for representing the data in categories.
The method of claim 2, wherein said first plurality of fields
(5) comprises a combination of at least a field for:
identifying the metric of facts (4);
naming the metric of facts (4);
indicating the number of categories (6) to be mapped from the data source;
naming the granularity of data (7) for forming a part of fields in said at least a second medium (3); and
identifying the data source (1).
The method of claim 3, wherein said granularity of data (7) comprises a string of a plurality of category (6) names and plurality of field names.
The method of claim 1, wherein said at least a second medium (3) comprises a second plurality of fields (8) interlinking with one another for generating data mapping relationship between said at least a first medium (2) and at least a second medium (3).
6. The method of claim 5, wherein said second plurality of fields (8) comprises a combination of at least a field for:
identifying the categories (6);
mapping the identification of the metric of facts (4) from at least a first medium (2);
storing surrogate key (9) of categories (6) obtained from said at least a first medium (2); and
identifying name of the data source (1).
7. The method of claim 6, wherein said at least a second medium (3) is partitioned based on numbers of each metric of facts (4) in the database.
8. The method of claim 1, wherein the mapping of relationship between data source (1) field and at least a second medium (3) is stored comprising the steps of: determining number of categories (6);
parsing said at least a second medium (3), the fields of categories (6) and identified data source (1);
assigning temporary variables to said at least a second medium (3), the fields of categories (6) and data source (1); and
repeating abovementioned steps for every number of categories (6).
9. The method of claim 8, wherein the data source (1) field is read and a recurring function initialized.
10. The method of claim 1, wherein said at least a metric of facts (4) is assigned with surrogate keys (9) by steps of:
assigning the surrogate keys (9) to each combination of values based on each field name of said at least a second medium (3) defined in the data source (i);
updating the surrogate keys (9) to said at least a second medium (3) for each of the metric of facts (4);
updating a predefined field (10) in each metric of facts (4) in the database with assigned surrogate keys (9) of said at least a second medium (3); and
executing a recurring function.
11. The method of claim 10, wherein said predefined (10) field in the existing metric of facts (4) is assigned a value of less than zero then a recurring function is executed.
PCT/MY2015/050077 2014-08-05 2015-07-28 Method for data input into a database WO2016022019A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2014702153 2014-08-05
MYPI2014702153A MY187720A (en) 2014-08-05 2014-08-05 Method for data input into a database

Publications (1)

Publication Number Publication Date
WO2016022019A1 true WO2016022019A1 (en) 2016-02-11

Family

ID=54145972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2015/050077 WO2016022019A1 (en) 2014-08-05 2015-07-28 Method for data input into a database

Country Status (2)

Country Link
MY (1) MY187720A (en)
WO (1) WO2016022019A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947861A (en) * 2017-11-09 2019-06-28 北京京东尚科信息技术有限公司 The method, apparatus and computer-readable medium of object table are generated for data warehouse
JP7051108B2 (en) 2016-02-26 2022-04-11 クリスプ インテリジェンス プロプライエタリ リミテッド Fact Category Partitioning Unknown to Data Source Systems Methods for Inserting and Retrieving Data Using Information Repositories and Information Repositories
CN114595294A (en) * 2022-03-11 2022-06-07 北京梦诚科技有限公司 Data warehouse modeling and extracting method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090055439A1 (en) 2007-08-24 2009-02-26 Ketera Technologies, Inc. Flexible Dimension Approach In A Data Warehouse
US20120109917A1 (en) 2010-10-27 2012-05-03 Comeau Gregory P System and method for synchronously updating a hierarchy bridge table
US8352458B2 (en) 2008-05-07 2013-01-08 Oracle International Corporation Techniques for transforming and loading data into a fact table in a data warehouse
US20140188784A1 (en) * 2012-12-28 2014-07-03 Joseph Guerra Systems and methods for data-warehousing to facilitate advanced business analytic assessment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090055439A1 (en) 2007-08-24 2009-02-26 Ketera Technologies, Inc. Flexible Dimension Approach In A Data Warehouse
US8352458B2 (en) 2008-05-07 2013-01-08 Oracle International Corporation Techniques for transforming and loading data into a fact table in a data warehouse
US20120109917A1 (en) 2010-10-27 2012-05-03 Comeau Gregory P System and method for synchronously updating a hierarchy bridge table
US20140188784A1 (en) * 2012-12-28 2014-07-03 Joseph Guerra Systems and methods for data-warehousing to facilitate advanced business analytic assessment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Extract, transform, load - Wikipedia, the free encyclopedia", 9 January 2013 (2013-01-09), XP055120766, Retrieved from the Internet <URL:http://wayback.archive.org/web/20130109062117/http://en.wikipedia.org/wiki/Extract,_transform,_load> [retrieved on 20140528] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7051108B2 (en) 2016-02-26 2022-04-11 クリスプ インテリジェンス プロプライエタリ リミテッド Fact Category Partitioning Unknown to Data Source Systems Methods for Inserting and Retrieving Data Using Information Repositories and Information Repositories
CN109947861A (en) * 2017-11-09 2019-06-28 北京京东尚科信息技术有限公司 The method, apparatus and computer-readable medium of object table are generated for data warehouse
CN114595294A (en) * 2022-03-11 2022-06-07 北京梦诚科技有限公司 Data warehouse modeling and extracting method and system

Also Published As

Publication number Publication date
MY187720A (en) 2021-10-14

Similar Documents

Publication Publication Date Title
US9519687B2 (en) Minimizing index maintenance costs for database storage regions using hybrid zone maps and indices
CN103577440B (en) A kind of data processing method and device in non-relational database
JP7046172B2 (en) Computer implementation methods, computer program products, and systems for storing records in shard database shard tables, computer implementation methods, computer program products, and systems for retrieving records from shard database shard tables. System, as well as a system for storing shard databases
US9576011B2 (en) Indexing hierarchical data
JP7011848B2 (en) Systems, methods, and data structures for fast retrieval or filtering of large datasets
US9665600B2 (en) Method for implementing database
US20200004851A1 (en) Trie-based indices for databases
US10452676B2 (en) Managing database with counting bloom filters
US9129010B2 (en) System and method of partitioned lexicographic search
US20180144061A1 (en) Edge store designs for graph databases
US20140052727A1 (en) Data processing for database aggregation operation
CN103002061A (en) Method and device for mutual conversion of long domain names and short domain names
US20080294673A1 (en) Data transfer and storage based on meta-data
WO2016022019A1 (en) Method for data input into a database
CN114840487A (en) Metadata management method and device for distributed file system
US8949282B1 (en) Efficient storage of non-searchable attributes
CN116680278B (en) Data processing method, device, electronic equipment and storage medium
CN104636471A (en) Procedure code finding method and device
US20180144060A1 (en) Processing deleted edges in graph databases
CN104750743A (en) System and method for ticking and rechecking transaction files
CN106845787A (en) A kind of data method for automatically exchanging and device
US8150887B1 (en) Identifiers for non-searchable attributes
CN116450607A (en) Data processing method, device and storage medium
US8930336B2 (en) Retrieval of searchable and non-searchable attributes
US8171054B1 (en) Optimized fetching for customization object attributes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15763988

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15763988

Country of ref document: EP

Kind code of ref document: A1