WO2016022019A1 - Method for data input into a database - Google Patents
Method for data input into a database Download PDFInfo
- Publication number
- WO2016022019A1 WO2016022019A1 PCT/MY2015/050077 MY2015050077W WO2016022019A1 WO 2016022019 A1 WO2016022019 A1 WO 2016022019A1 MY 2015050077 W MY2015050077 W MY 2015050077W WO 2016022019 A1 WO2016022019 A1 WO 2016022019A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- medium
- metric
- facts
- data source
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000001131 transforming effect Effects 0.000 claims abstract description 9
- 238000013507 mapping Methods 0.000 claims description 14
- 238000013506 data mapping Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 12
- 238000007405 data analysis Methods 0.000 description 2
- 238000013501 data transformation Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000013329 compounding Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Definitions
- the present invention relates to a method for data input into a database, more particularly the present invention relates to a method for data input into a data warehouse.
- a data warehouse allows an organization or company to create a consolidated view of its enterprise data, and using the data for reporting and data analysis.
- a data warehouse can be used to analyse a particular subject area, integrate data from multiple data sources as well as store current and historical data. Creating a data warehouse requires mapping data between sources and targets, then capturing the details of the transformation in a metadata repository.
- a data warehouse generally comprises numerous fact tables in order to cater for various business process.
- Patent US 20090055439 Al mentions a method for dynamically adding dimensions specific to a tenant in a data warehouse without changing the structure of the fact table.
- US 20090055439 Al does not teach of adding dimensions into multiple fact tables as well as restriction to data input as only a predefined number of columns is able to be added to the fact table.
- Patent US 20120109917 Al teaches a method for maintaining a hierarchy bridge table consistent with a corresponding hierarchy table.
- US 20120109917 Al supports only a single hierarchy table node.
- a hierarchy bridge table must always be correct when compared to the corresponding hierarchy to prevent the system from being defective.
- Patent US 8352458 B2 points out techniques for transforming records prior to loading the records into a data warehouse.
- the system mentioned in US 8352458 B2 uses the dimension tables for adding foreign keys corresponding to dimension tables with data stored in a staging table before loading to a fact table.
- the problem in the aforementioned prior arts is that the bridge table is unable to support data loading into multiple fact tables in the data warehouse without changing the structure of the fact table.
- the present invention addresses this problem.
- the present invention aims to provide a method for data input into a database whilst maintaining the structure of the metric of facts, known as fact tables.
- It is an object of the present invention to provide a method for data input into a database comprising the steps of extraction of data from a data source, categorizing data in at least a first medium, providing at least a second medium, generating data mapping relationship from at least a first medium to at least a second medium, transforming said data to at least a metric of fields and loading a database with metric of facts from said at least a second medium. It is a further object of the present invention to provide a method for automation of dynamically adding new categories into multiple metric of facts with respect to at least a first medium for categorizing data .
- the present invention provides a method for data input into a database for supporting multiple fact tables.
- Figure 1 illustrates a method of data input into a database according to the present invention.
- Figure 2 illustrates a graphical representation of a first medium according to the present invention.
- Figure 3 illustrates a method for assigning surrogate keys in the first medium according to the present invention.
- Figure 4 illustrates a graphical representation of a second medium according to the present invention.
- Figure 5 illustrates a method for partitioning the second medium according to the present invention.
- Figure 6 illustrates a method for updating assigned surrogate key to the second medium according to the present invention.
- Figure 7 illustrates a method for updating predefined column in each metric of facts with assigned surrogate keys according to the present invention.
- Figure 8 illustrates a method for adding of a predefined column to a metric of facts according to the present invention.
- Figure 9 illustrates a flowchart for achieving dynamic dimension process according to the present invention.
- data is populated into a fact table in a database or data warehouse by first extracting data from a data source (1).
- This data in its natural key form will be assigned a surrogate key (9) representation.
- the data extracted from the data source (1) will then be transformed into a metric of facts (4) or also known as fact tables through intermediate mediums or bridge tables such as dimension tables.
- the measure is calculated and the aggregate value from multiple dimension tables will be loaded to fact tables in a data warehouse.
- This method of data population is carried out by mapping the aggregate value from a first medium (2) to a predefined metric of facts (4) preceding to transforming data to at least a metric of facts (4), and is capable of maintaining the structure of the metric of facts.
- data sources (1) are provided with new additional columns.
- individual dimension tables will be generated to map the natural key with the newly assigned surrogate keys (9). Additional columns have to be created in the existing fact table to accommodate these additional fields. Therefore, changes to the existing extract, transform and load, ETL process from the data source (1) to the fact table are required.
- the process of transforming data from the data source (1) to dimension tables and finally loading data into to the fact tables has to be repeated for each fact table in the data warehouse with a plurality of fact tables. If the fact tables have large data size, this process will be time consuming. Furthermore, frequent changes on the columns of fact tables and ETL process will cause adverse impact on the database design.
- the abovementioned problems are solved with the introduction of the present invention.
- the present invention enables automation of dynamically adding dimensions into multiple fact tables with respect to the data source (1) defined whilst retaining the structure of the fact table.
- the present invention moreover discloses a method of data mapping that supports multiple fact tables using a prefixed field to reference the dynamic dimension table.
- FIG. 1 the figure illustrates a method for data input into a database according to the present invention comprising the steps of extracting data from a data source (1), categorizing said data in at least a first medium (2), providing at least a second medium (3), generating data mapping relationship from at least a first medium (2) to at least a second medium (3), transforming said data to at least a metric of facts (4) and loading the database with at least a metric of facts (4) from at least a second medium (3).
- Figure 2 illustrates at least a first medium (2) shown in Figure 1 wherein at least a first medium categorizes data from the data source (1) according to the present invention.
- a key step of data input into the database is through categorizing or defining data by at least a first medium (2) in a form of fact definition table.
- This said at least a first medium (2) or fact definition table has to be defined in order to facilitate the definition of one or more metric of facts (4) or fact tables to be added to the data warehouse.
- Said at least a first medium (2) comprises a first plurality of fields (5) interlinking with one another for representing the data in categories (6).
- Said at least a first medium (2) or fact definition table comprises at least a field for identification of the metric of facts (4) (i.e. "Fact ID” column), a field for naming of the metric of facts (4) (i.e. "Fact Name” column), a field indicating the number of categories (6) to be mapped from the data source (1) (i.e. "num_dimension” column), a field for naming of the granularity of data (7) (i.e. "dimension” column) and a field for identifying the data source (1) (i.e. "data source” column). For each of these fields in the fact definition table, a user is to fill in the data accordingly.
- the abovementioned "Fact ID” uniquely identifies the ID of the fact table in the data warehouse while the "Fact Name” identifies the name of the fact table.
- the number of categories (6) to be mapped from the data source (1) is defined by the "num_dimension” field.
- the "dimension” field enables a user to define multiple dimension table name and the column name mapping to the data source (1) table.
- the field name defined here has the same column name in the data source (1) table to enable matching mapping name.
- the "data source” column identifies the name of the table of the data source (1).
- FIG 3 the figure illustrates naming of the granularity of data (7) according to the present invention. The naming of the granularity of data (7) is registered under the "dimension” field mentioned above.
- This step involves assigning surrogate key (9) to be represented in said at least a second medium (3) prior to be represented in a metric of facts (4).
- a naming convention for the granularity of data (7) comprises a string of a plurality of category names and plurality of field names in the sequence of field name then category name as seen below:
- Figure 4 illustrates at least a second medium (3) shown in Figure 1, comprising a second plurality of fields (8) interlinking with one another for generating data mapping relationship according to the present invention.
- a second medium (3) is provided which generates data mapping relationship from said at least a first medium (2) to said at least a second medium (3).
- Said at least a second medium (3) also referred to as dynamic dimension table will be defined to facilitate transforming of data to one or more fact tables.
- the dynamic dimension table comprises a field for identification of the categories (6) (i.e. "DynamicDim ID” column), a field mapping the identification of the metric of facts (4) from at least a first medium (2) (i.e. "Fact ID” column), a field containing surrogate key of categories (6) obtained from at least a first medium (2) (i.e. "key value” column) and a field for identifying name of the data (i.e. "dimension name” column).
- the "DynamicDim ID” identifies the ID of each dynamic dimension or category from the data source (1).
- the "Fact ID” maps each fact table with the dynamic dimension table.
- the "key value” contains the surrogate key (9) of the dimension table.
- the surrogate key (9) is obtained from mapping of each element in the "dimension” column of the Fact definition table. Furthermore, 'dimension name' column identifies the dimension table name.
- the figure illustrates a method for partitioning of said at least a second medium (3) based on numbers of each metric of facts (4) in the database according to the present invention.
- the figure clearly shows that said at least a second medium (3) in that of dynamic dimension table is partitioned based on the number of fact tables in the data warehouse (i.e. "Fact ID column).
- a predefined column (10) is added to map the surrogate key (9) of the dynamic dimension table.
- the figure illustrates a method for the storing of mapping relationship from the data source (1) field to at least a second medium (3) through a first medium according to the present invention.
- the figure shows multiple "data source” fields such as service type dimension and ethnic dimension are mapped and stored in "DynamicDim" table in the steps of:
- FIG. 7 the figure illustrates a method for assigning of surrogate keys (9) from said at least a second medium (3) to at least a metric of facts (4) according to the present invention.
- the multiple fact tables or metric of facts (4) are assigned with surrogate keys (9) utilizing the mapping of relationship between multiple data sources (1) and multiple dimension tables stored, where for each row of data read from data source (1), row > Numl
- the figure illustrates a method for updating of a predefined "DynamicDim" column (10) added to existing fact table according to the present invention.
- a default value of less than zero i.e. -1 is assigned to the predefined column (10) which means that when there is no value in the surrogate key (9) of the dynamic dimension, the function will not run.
- FIG 9 the figure illustrates the steps for dynamically adding dimensions into a fact table in the form of a flowchart according to the present invention.
- Data obtained from multiple sources will be categorized into at least a first medium (2), in this case a fact definition table for mapping of relationship between multiple data source (1), column names and multiple dimension table column name for each fact table (21). If the number of rows in the fact definition table exceeds zero (22), the subsequent process entails. Details or columns in the fact definition table will be read and a temporary variable is initialized to zero (23). The dynamic dimension table will be partitioned based on each fact table in the data warehouse (24). If numbers of dimension or fields exceeds the temporary variable, the process will loop (25).
- mapping of the relationship between multiple data source (1) and multiple dimension table column will be stored through means of parsing the multiple dimension name, column name and data source (31) and then assigning temporary variables to them (32).
- multiple fact tables will be assigned surrogate keys (9) utilizing the mapping of relationship between multiple data sources (1) and multiple dimension tables (41).
- the assigned surrogate keys (9) will be updated to a column in the dynamic dimension table (42) for each metric of fact (4).
- the data is then populated into the data base (1) through updating of each column in the metric of facts with assigned surrogate keys (43).
- the process flow will be repeated if the data source (1) is larger than Numl (27).
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A database is a central source of data for query and analysis. Functions of a database include retrieving, analyzing, transforming, loading and managing data; making them available for further use. Disclosed herein is a method for improved data input into a database whilst maintaining the structure of metric of facts. Data is extracted from a data source (1), transformed and loaded into fact tables in a data warehouse by means of bridge tables. Automation of dynamically adding dimensions into multiple fact tables with respect to the data source (1) is disclosed in the present invention.
Description
METHOD FOR DATA INPUT INTO A DATABASE FIELD OF TECHNOLOGY The present invention relates to a method for data input into a database, more particularly the present invention relates to a method for data input into a data warehouse.
BACKGROUND OF THE INVENTION
A data warehouse allows an organization or company to create a consolidated view of its enterprise data, and using the data for reporting and data analysis. A data warehouse can be used to analyse a particular subject area, integrate data from multiple data sources as well as store current and historical data. Creating a data warehouse requires mapping data between sources and targets, then capturing the details of the transformation in a metadata repository.
One of the challenges faced is that frequent changes occurs on the management to analyse the data differently. For example, analysing the location of patient for a disease, distribution of bed in a hospital and patient history on top of data for the type of disease. Due to the fact that frequent new requests are made to meet new scope of data analysis, additional data or columns have to be added into the fact table. This affects significant process time if the fact table is huge. In addition, changes has to be made to the existing ETL (Extraction, Transformation and Loading) process of data transformation to fact table. Further compounding to the issue, a data warehouse generally comprises numerous fact tables in order to cater for various business process.
Patent US 20090055439 Al mentions a method for dynamically adding dimensions specific to a tenant in a data warehouse without changing the structure of the fact table. However, US 20090055439 Al does not teach of adding dimensions into multiple fact tables as well as restriction to data input as only a predefined number of columns is able to be added to the fact table.
Patent US 20120109917 Al teaches a method for maintaining a hierarchy bridge table consistent with a corresponding hierarchy table. However, US 20120109917 Al supports only a single hierarchy table node. Furthermore, a hierarchy bridge table must
always be correct when compared to the corresponding hierarchy to prevent the system from being defective.
Patent US 8352458 B2 points out techniques for transforming records prior to loading the records into a data warehouse. The system mentioned in US 8352458 B2 uses the dimension tables for adding foreign keys corresponding to dimension tables with data stored in a staging table before loading to a fact table.
The problem in the aforementioned prior arts is that the bridge table is unable to support data loading into multiple fact tables in the data warehouse without changing the structure of the fact table. The present invention addresses this problem.
SUMMARY OF THE INVENTION The present invention aims to provide a method for data input into a database whilst maintaining the structure of the metric of facts, known as fact tables.
It is an object of the present invention to provide a method for data input into a database comprising the steps of extraction of data from a data source, categorizing data in at least a first medium, providing at least a second medium, generating data mapping relationship from at least a first medium to at least a second medium, transforming said data to at least a metric of fields and loading a database with metric of facts from said at least a second medium. It is a further object of the present invention to provide a method for automation of dynamically adding new categories into multiple metric of facts with respect to at least a first medium for categorizing data .
It is a further object of the invention to provide a method for supporting multiple metric of facts by having a prefix column in the metric of facts referencing to at least a second medium.
It is a further object of the present invention to provide a method for data input in order to avoid changes to existing extract, transform and load process of data transformation to the metric of facts.
The present invention provides a method for data input into a database for supporting multiple fact tables.
BRIEF DESCRIPTION OF DRAWINGS
Figure 1 illustrates a method of data input into a database according to the present invention.
Figure 2 illustrates a graphical representation of a first medium according to the present invention.
Figure 3 illustrates a method for assigning surrogate keys in the first medium according to the present invention.
Figure 4 illustrates a graphical representation of a second medium according to the present invention.
Figure 5 illustrates a method for partitioning the second medium according to the present invention.
Figure 6 illustrates a method for updating assigned surrogate key to the second medium according to the present invention.
Figure 7 illustrates a method for updating predefined column in each metric of facts with assigned surrogate keys according to the present invention.
Figure 8 illustrates a method for adding of a predefined column to a metric of facts according to the present invention.
Figure 9 illustrates a flowchart for achieving dynamic dimension process according to the present invention. DETAILED DESCRIPTION OF THE EMBODIMENTS
Described below are various preferred embodiments of the present invention with reference to the accompanying drawings. Each of the following preferred embodiments describes an example in which a method of data input into a database may be implemented and the improvements over existing prior art. The configuration of the invention is not limited to the configuration mentioned in the following description.
It will be appreciated that data is populated into a fact table in a database or data warehouse by first extracting data from a data source (1). This data in its natural key form will be assigned a surrogate key (9) representation. The data extracted from the data source (1) will then be transformed into a metric of facts (4) or also known as fact tables through intermediate mediums or bridge tables such as dimension tables. For
each row of data population, the measure is calculated and the aggregate value from multiple dimension tables will be loaded to fact tables in a data warehouse. This method of data population is carried out by mapping the aggregate value from a first medium (2) to a predefined metric of facts (4) preceding to transforming data to at least a metric of facts (4), and is capable of maintaining the structure of the metric of facts.
As demand increases by the management for more analysis on the data from the data warehouse, data sources (1) are provided with new additional columns. For each additional column or field in the data source (1), individual dimension tables will be generated to map the natural key with the newly assigned surrogate keys (9). Additional columns have to be created in the existing fact table to accommodate these additional fields. Therefore, changes to the existing extract, transform and load, ETL process from the data source (1) to the fact table are required. The process of transforming data from the data source (1) to dimension tables and finally loading data into to the fact tables has to be repeated for each fact table in the data warehouse with a plurality of fact tables. If the fact tables have large data size, this process will be time consuming. Furthermore, frequent changes on the columns of fact tables and ETL process will cause adverse impact on the database design. The abovementioned problems are solved with the introduction of the present invention. The present invention enables automation of dynamically adding dimensions into multiple fact tables with respect to the data source (1) defined whilst retaining the structure of the fact table. The present invention moreover discloses a method of data mapping that supports multiple fact tables using a prefixed field to reference the dynamic dimension table.
Referring to Figure 1, the figure illustrates a method for data input into a database according to the present invention comprising the steps of extracting data from a data source (1), categorizing said data in at least a first medium (2), providing at least a second medium (3), generating data mapping relationship from at least a first medium (2) to at least a second medium (3), transforming said data to at least a metric of facts (4) and loading the database with at least a metric of facts (4) from at least a second medium (3). Referring to Figure 1 and Figure 2, Figure 2 illustrates at least a first medium (2) shown in Figure 1 wherein at least a first medium categorizes data from the data source (1) according to the present invention. A key step of data input into the database is
through categorizing or defining data by at least a first medium (2) in a form of fact definition table. This said at least a first medium (2) or fact definition table has to be defined in order to facilitate the definition of one or more metric of facts (4) or fact tables to be added to the data warehouse.
Said at least a first medium (2) comprises a first plurality of fields (5) interlinking with one another for representing the data in categories (6). Said at least a first medium (2) or fact definition table comprises at least a field for identification of the metric of facts (4) (i.e. "Fact ID" column), a field for naming of the metric of facts (4) (i.e. "Fact Name" column), a field indicating the number of categories (6) to be mapped from the data source (1) (i.e. "num_dimension" column), a field for naming of the granularity of data (7) (i.e. "dimension" column) and a field for identifying the data source (1) (i.e. "data source" column). For each of these fields in the fact definition table, a user is to fill in the data accordingly.
The abovementioned "Fact ID" uniquely identifies the ID of the fact table in the data warehouse while the "Fact Name" identifies the name of the fact table. The number of categories (6) to be mapped from the data source (1) is defined by the "num_dimension" field. The "dimension" field enables a user to define multiple dimension table name and the column name mapping to the data source (1) table. The field name defined here has the same column name in the data source (1) table to enable matching mapping name. The "data source" column identifies the name of the table of the data source (1). Referring to Figure 3, the figure illustrates naming of the granularity of data (7) according to the present invention. The naming of the granularity of data (7) is registered under the "dimension" field mentioned above. This step involves assigning surrogate key (9) to be represented in said at least a second medium (3) prior to be represented in a metric of facts (4). A naming convention for the granularity of data (7) comprises a string of a plurality of category names and plurality of field names in the sequence of field name then category name as seen below:
<Dimension Namel xColumn Separator Begin xColumn Namel xColumn Separator End xSeparatorxDimension Name2xColumn Separator Begin xColumn Name2xColumn Separator End >.. <SeparatorxDimension Name(n) >
An example of the above naming convention is:
Ethnic'Src Ethnic ID' 11 Service Type'Src Service Type'
Referring to Figure 1 and Figure 4, Figure 4 illustrates at least a second medium (3) shown in Figure 1, comprising a second plurality of fields (8) interlinking with one another for generating data mapping relationship according to the present invention. Once the data has been categorized in at least a first medium (2) in that of the fact definition table, at least a second medium (3) is provided which generates data mapping relationship from said at least a first medium (2) to said at least a second medium (3). Said at least a second medium (3) also referred to as dynamic dimension table will be defined to facilitate transforming of data to one or more fact tables.
The dynamic dimension table comprises a field for identification of the categories (6) (i.e. "DynamicDim ID" column), a field mapping the identification of the metric of facts (4) from at least a first medium (2) (i.e. "Fact ID" column), a field containing surrogate key of categories (6) obtained from at least a first medium (2) (i.e. "key value" column) and a field for identifying name of the data (i.e. "dimension name" column).
The "DynamicDim ID" identifies the ID of each dynamic dimension or category from the data source (1). The "Fact ID" maps each fact table with the dynamic dimension table. The "key value" contains the surrogate key (9) of the dimension table. The surrogate key (9) is obtained from mapping of each element in the "dimension" column of the Fact definition table. Furthermore, 'dimension name' column identifies the dimension table name.
Referring to Figure 5, the figure illustrates a method for partitioning of said at least a second medium (3) based on numbers of each metric of facts (4) in the database according to the present invention. The figure clearly shows that said at least a second medium (3) in that of dynamic dimension table is partitioned based on the number of fact tables in the data warehouse (i.e. "Fact ID column). For the every fact table in the data warehouse, a predefined column (10) is added to map the surrogate key (9) of the dynamic dimension table.
Referring to Figure 6, the figure illustrates a method for the storing of mapping relationship from the data source (1) field to at least a second medium (3) through a first medium according to the present invention. The figure shows multiple "data source" fields such as service type dimension and ethnic dimension are mapped and stored in "DynamicDim" table in the steps of:
i. determining number of categories (6) or dimensions (num_dimension)
ii. parsing at least a second medium (3) or each dimension table and the field of "dimension" column, followed by reading of table name in "data source" column.
iii. assigning temporary variables to dimension table name, column name and data source (1) name
iv. executing function Num=Num+ l
The table name of the data source (1) from the database will then be read, initializing Numl(Numl = 0) Referring to Figure 7, the figure illustrates a method for assigning of surrogate keys (9) from said at least a second medium (3) to at least a metric of facts (4) according to the present invention. The multiple fact tables or metric of facts (4) are assigned with surrogate keys (9) utilizing the mapping of relationship between multiple data sources (1) and multiple dimension tables stored, where for each row of data read from data source (1), row > Numl
i. assigning a surrogate key (9) to each unique combination of values based on each column name of dimension table name defined in the Fact Definition matching column name defined in the data source (1) ii. updating assigned surrogate key (9) to a field (ie: "Key Value" column) in the dynamic dimension table for each fact table
iii. updating a predefined column (10) (ie: "DynamicDim" column) in each fact table in the data warehouse with the assigned surrogate keys (9) of the dynamic dimension table
iv. executing Increment Numl =Numl + 1
Referring to figure 8, the figure illustrates a method for updating of a predefined "DynamicDim" column (10) added to existing fact table according to the present invention. A default value of less than zero (i.e. -1) is assigned to the predefined column (10) which means that when there is no value in the surrogate key (9) of the dynamic dimension, the function will not run.
Referring to Figure 9, the figure illustrates the steps for dynamically adding dimensions into a fact table in the form of a flowchart according to the present invention. Data obtained from multiple sources will be categorized into at least a first medium (2), in this case a fact definition table for mapping of relationship between multiple data source (1), column names and multiple dimension table column name for each fact table (21). If the number of rows in the fact definition table exceeds zero (22), the subsequent
process entails. Details or columns in the fact definition table will be read and a temporary variable is initialized to zero (23). The dynamic dimension table will be partitioned based on each fact table in the data warehouse (24). If numbers of dimension or fields exceeds the temporary variable, the process will loop (25). The mapping of the relationship between multiple data source (1) and multiple dimension table column will be stored through means of parsing the multiple dimension name, column name and data source (31) and then assigning temporary variables to them (32). The table name of the data source (1) from the database will be read, initializing Numl (Numl =0) (26). Finally, multiple fact tables will be assigned surrogate keys (9) utilizing the mapping of relationship between multiple data sources (1) and multiple dimension tables (41). The assigned surrogate keys (9) will be updated to a column in the dynamic dimension table (42) for each metric of fact (4). The data is then populated into the data base (1) through updating of each column in the metric of facts with assigned surrogate keys (43). The process flow will be repeated if the data source (1) is larger than Numl (27).
In as much as the present invention is subject to many variations, modifications and changes in detail, it is intended that all matter contained in the foregoing description or shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
Claims
A method for data input into a database comprising the steps of:
extracting data from a data source (1);
categorizing said data in at least a first medium (2);
providing at least a second medium (3);
generating data mapping relationship from at least a first medium
(2) to at least a second medium
(3);
transforming said data to at least a metric of facts (4); and
loading the database with said at least a metric of facts (4);
characterized in that the method further comprising a step of mapping a value from said at least a first medium (2) to a predefined metric in said at least a metric of facts (4) preceding to said step of transforming said data to at least a metric of facts
(4) which is capable of maintaining structure of the metric of facts.
The method of claim 1, wherein said at least a first medium (2) comprises a first plurality of fields (5) interlinking with one another for representing the data in categories.
The method of claim 2, wherein said first plurality of fields
(5) comprises a combination of at least a field for:
identifying the metric of facts (4);
naming the metric of facts (4);
indicating the number of categories (6) to be mapped from the data source;
naming the granularity of data (7) for forming a part of fields in said at least a second medium (3); and
identifying the data source (1).
The method of claim 3, wherein said granularity of data (7) comprises a string of a plurality of category (6) names and plurality of field names.
The method of claim 1, wherein said at least a second medium (3) comprises a second plurality of fields (8) interlinking with one another for generating data mapping relationship between said at least a first medium (2) and at least a second medium (3).
6. The method of claim 5, wherein said second plurality of fields (8) comprises a combination of at least a field for:
identifying the categories (6);
mapping the identification of the metric of facts (4) from at least a first medium (2);
storing surrogate key (9) of categories (6) obtained from said at least a first medium (2); and
identifying name of the data source (1).
7. The method of claim 6, wherein said at least a second medium (3) is partitioned based on numbers of each metric of facts (4) in the database.
8. The method of claim 1, wherein the mapping of relationship between data source (1) field and at least a second medium (3) is stored comprising the steps of: determining number of categories (6);
parsing said at least a second medium (3), the fields of categories (6) and identified data source (1);
assigning temporary variables to said at least a second medium (3), the fields of categories (6) and data source (1); and
repeating abovementioned steps for every number of categories (6).
9. The method of claim 8, wherein the data source (1) field is read and a recurring function initialized.
10. The method of claim 1, wherein said at least a metric of facts (4) is assigned with surrogate keys (9) by steps of:
assigning the surrogate keys (9) to each combination of values based on each field name of said at least a second medium (3) defined in the data source (i);
updating the surrogate keys (9) to said at least a second medium (3) for each of the metric of facts (4);
updating a predefined field (10) in each metric of facts (4) in the database with assigned surrogate keys (9) of said at least a second medium (3); and
executing a recurring function.
11. The method of claim 10, wherein said predefined (10) field in the existing metric of facts (4) is assigned a value of less than zero then a recurring function is executed.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
MYPI2014702153 | 2014-08-05 | ||
MYPI2014702153A MY187720A (en) | 2014-08-05 | 2014-08-05 | Method for data input into a database |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016022019A1 true WO2016022019A1 (en) | 2016-02-11 |
Family
ID=54145972
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/MY2015/050077 WO2016022019A1 (en) | 2014-08-05 | 2015-07-28 | Method for data input into a database |
Country Status (2)
Country | Link |
---|---|
MY (1) | MY187720A (en) |
WO (1) | WO2016022019A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109947861A (en) * | 2017-11-09 | 2019-06-28 | 北京京东尚科信息技术有限公司 | The method, apparatus and computer-readable medium of object table are generated for data warehouse |
JP7051108B2 (en) | 2016-02-26 | 2022-04-11 | クリスプ インテリジェンス プロプライエタリ リミテッド | Fact Category Partitioning Unknown to Data Source Systems Methods for Inserting and Retrieving Data Using Information Repositories and Information Repositories |
CN114595294A (en) * | 2022-03-11 | 2022-06-07 | 北京梦诚科技有限公司 | Data warehouse modeling and extracting method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090055439A1 (en) | 2007-08-24 | 2009-02-26 | Ketera Technologies, Inc. | Flexible Dimension Approach In A Data Warehouse |
US20120109917A1 (en) | 2010-10-27 | 2012-05-03 | Comeau Gregory P | System and method for synchronously updating a hierarchy bridge table |
US8352458B2 (en) | 2008-05-07 | 2013-01-08 | Oracle International Corporation | Techniques for transforming and loading data into a fact table in a data warehouse |
US20140188784A1 (en) * | 2012-12-28 | 2014-07-03 | Joseph Guerra | Systems and methods for data-warehousing to facilitate advanced business analytic assessment |
-
2014
- 2014-08-05 MY MYPI2014702153A patent/MY187720A/en unknown
-
2015
- 2015-07-28 WO PCT/MY2015/050077 patent/WO2016022019A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090055439A1 (en) | 2007-08-24 | 2009-02-26 | Ketera Technologies, Inc. | Flexible Dimension Approach In A Data Warehouse |
US8352458B2 (en) | 2008-05-07 | 2013-01-08 | Oracle International Corporation | Techniques for transforming and loading data into a fact table in a data warehouse |
US20120109917A1 (en) | 2010-10-27 | 2012-05-03 | Comeau Gregory P | System and method for synchronously updating a hierarchy bridge table |
US20140188784A1 (en) * | 2012-12-28 | 2014-07-03 | Joseph Guerra | Systems and methods for data-warehousing to facilitate advanced business analytic assessment |
Non-Patent Citations (1)
Title |
---|
ANONYMOUS: "Extract, transform, load - Wikipedia, the free encyclopedia", 9 January 2013 (2013-01-09), XP055120766, Retrieved from the Internet <URL:http://wayback.archive.org/web/20130109062117/http://en.wikipedia.org/wiki/Extract,_transform,_load> [retrieved on 20140528] * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7051108B2 (en) | 2016-02-26 | 2022-04-11 | クリスプ インテリジェンス プロプライエタリ リミテッド | Fact Category Partitioning Unknown to Data Source Systems Methods for Inserting and Retrieving Data Using Information Repositories and Information Repositories |
CN109947861A (en) * | 2017-11-09 | 2019-06-28 | 北京京东尚科信息技术有限公司 | The method, apparatus and computer-readable medium of object table are generated for data warehouse |
CN114595294A (en) * | 2022-03-11 | 2022-06-07 | 北京梦诚科技有限公司 | Data warehouse modeling and extracting method and system |
Also Published As
Publication number | Publication date |
---|---|
MY187720A (en) | 2021-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9519687B2 (en) | Minimizing index maintenance costs for database storage regions using hybrid zone maps and indices | |
CN103577440B (en) | A kind of data processing method and device in non-relational database | |
JP7046172B2 (en) | Computer implementation methods, computer program products, and systems for storing records in shard database shard tables, computer implementation methods, computer program products, and systems for retrieving records from shard database shard tables. System, as well as a system for storing shard databases | |
US9576011B2 (en) | Indexing hierarchical data | |
JP7011848B2 (en) | Systems, methods, and data structures for fast retrieval or filtering of large datasets | |
US9665600B2 (en) | Method for implementing database | |
US20200004851A1 (en) | Trie-based indices for databases | |
US10452676B2 (en) | Managing database with counting bloom filters | |
US9129010B2 (en) | System and method of partitioned lexicographic search | |
US20180144061A1 (en) | Edge store designs for graph databases | |
US20140052727A1 (en) | Data processing for database aggregation operation | |
CN103002061A (en) | Method and device for mutual conversion of long domain names and short domain names | |
US20080294673A1 (en) | Data transfer and storage based on meta-data | |
WO2016022019A1 (en) | Method for data input into a database | |
CN114840487A (en) | Metadata management method and device for distributed file system | |
US8949282B1 (en) | Efficient storage of non-searchable attributes | |
CN116680278B (en) | Data processing method, device, electronic equipment and storage medium | |
CN104636471A (en) | Procedure code finding method and device | |
US20180144060A1 (en) | Processing deleted edges in graph databases | |
CN104750743A (en) | System and method for ticking and rechecking transaction files | |
CN106845787A (en) | A kind of data method for automatically exchanging and device | |
US8150887B1 (en) | Identifiers for non-searchable attributes | |
CN116450607A (en) | Data processing method, device and storage medium | |
US8930336B2 (en) | Retrieval of searchable and non-searchable attributes | |
US8171054B1 (en) | Optimized fetching for customization object attributes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15763988 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15763988 Country of ref document: EP Kind code of ref document: A1 |